Skip to main content

Subtitle Guidelines

The VoiceAI platform excels in creating subtitles for various video content, including movies and TV shows. While transforming spoken dialogue into text is a crucial step, raw transcription often doesn't suffice for effective subtitles. Subtitles require adherence to certain standards for optimal viewer readability and comprehension. To address this need, VoiceAI has a customizable subtitle guideline feature. This allows users to tailor key aspects of their subtitles for a superior viewing experience. You can set parameters such as the maximum number of lines in a subtitle, the ideal duration each subtitle should be displayed based on the audio length, and the maximum character count per line. These configurations ensure your subtitles are not only accurate but also viewer-friendly, enhancing the overall accessibility of your content.

File Transcription Job with Subtitle Guidelines

Copy and paste the below curl request on your terminal to start a transcription using the API. Fill the variables with the appropriate values, as mentioned in the overview.

curl --location 'https://voice.neuralspace.ai/api/v2/jobs' \
--header 'Authorization: {{API_KEY}}' \
--form 'files=@"{{LOCAL_AUDIO_FILE_PATH}}"' \
--form 'config="{\"speaker_diarization\":{},\"subtitles_guidelines\":{\"line_count\": {{LINES}},\"duration\":{{DURATION}},\"character_count\":{{CHARACTER_COUNT}}}"'

In the above request, the line_count, duration, and character_count parameters need to passed to set the guidelines. It also returns a response similar to the regular file transcription API as seen in overview.

{
"success": true,
"message": "Job created successfully",
"data": {
"jobId": "281f8662-cdc3-4c76-82d0-e7d14af52c46"
}
}
caution

To use the subtitle guidelines feature, you must have the speaker diarization feature enabled as well.

Fetch Transcription and Subtitle Guideline Results

When you pass the jobId (received in response to the transcription API) to the API below, it fetches the status and results of the job.

curl --location 'https://voice.neuralspace.ai/api/v2/jobs/{{jobId}}' \
--header 'Authorization: {{API_KEY}}'

The response of the request above appears as follows:

{
...
"data": {
...
"result": {
"transcription": {
...
"segments": [
{
"startTime": 6.741562500000001,
"endTime": 8.43,
"text": "We've been at this for hours\nnow.",
"speaker": "Speaker 1"
},
{
"startTime": 8.43,
"endTime": 9.55,
"text": "Have you found anything useful\nin",
"speaker": "Speaker 1"
},
{
"startTime": 9.55,
"endTime": 10.51,
"text": "any of those books?",
"speaker": "Speaker 1"
},
...
]
}
}
}
}

The difference between when guidelines are enabled and when they are not is that the segments that are returned are in line with the guidelines provided instead of the default outputs when enabled. The rest of the features and outpus work as expected.

Troubleshooting and FAQ

Sub-issues? Check out our FAQ page. If you still need help, feel free to reach out to us directly at support@neuralspace.ai or join our Slack community.