Transcript
The Transcript class is designed to handle text-to-speech outputs generated by machine learning models, such as OpenAIās Whisper. It supports outputs that include word-level timestamps.
Constructing a Transcript
You typically create a Transcript instance from JSON data. The JSON should adhere to the following structure:
type Captions = {
token: string; // The spoken word
start: number; // The start in milliseconds
stop: number; // The stop in milliseconds
}[][];The JSON structure is a 3-dimensional array, where the first level represents sentences, and each sentence contains a list of words or tokens. This structure preserves the semantic grouping of words.
To create a Transcript from this JSON, use the following:
import { Transcript } from '@diffusionstudio/core';
const transcript = Transcript.fromJSON(captions); // `captions` is of type CaptionsManual Construction
You can also manually create a Transcript instance:
import { Transcript, WordGroup, Word } from '@diffusionstudio/core';
const transcript = new Transcript([
new WordGroup([
new Word('Hello', 0, 300),
new Word('World', 320, 600),
])
]);Utility Methods
The Transcript class provides several utility methods:
transcript.optimize();
transcript.toSRT();
transcript.slice(20);optimize(): Adjusts the timestamps of words to improve readability when aligned on a timeline.toSRT(): Converts the transcript to an SRT format blob, which can be downloaded and used with most video editing applications.slice(wordCount: number): Creates a newTranscriptcontaining only the specified number of words. This is useful for generating preview captions.
Iterating Over Words
The Transcript class offers a powerful iteration method via the iter function:
for (const group of transcript.iter({ count: [2] })) {
// Each group will contain up to two words
}The iter method allows you to iterate over words with various options, introducing a degree of randomness to improve captioning quality. If two values are provided, a random number between them is chosen.
Those are the available options for iteration:
count: iterate by word countduration: iterate by group durationlength: iterate by the number of characters