Skip to main content

Transcribe an audio file

This article shows how you can simply transcribe a file using NeuralSpace's VoiceAI platform. Follow along for your preferred method - UI, API or Python SDK, by selecting the appropriate tabs below.

Prerequisites

Make sure to follow Get Started to sign up and have all the necessary pre-requisites.
If you are using the APIs or the SDK, save your API key in a variable called NS_API_KEY before moving ahead. Do this by using the command below

export NS_API_KEY=YOUR_API_KEY

Refer to the Supported Languages page for language codes and supported domains.

Sample Audio Files

You can experiment with the sample audio files in English here and Arabic here.

Start a File Transcription Job

Copy and paste the below mentioned curl requeston your terminal to start a transcription using the API. Fill the variables with the appropriate values.

curl --location 'https://voice.neuralspace.ai/api/v1/jobs' \
--header "Authorization: $NS_API_KEY" \
--form 'files=@"{{LOCAL_AUDIO_FILE_PATH}}"' \
--form 'config="{\"file_transcription\":{\"language_id\":\"{{LANG}}\", \"mode\":\"advanced\", \"number_formatting\":\"{{NUMBER_FORMATTING}}\"}}"'
VariableRequiredDescription
NS_API_KEYYesYour API key from the platform.
LOCAL_AUDIO_FILE_PATHYesFile path on your local machine of the audio file that you want to transcribe.
LANGNoLanguage ID corresponding to the source audio file. Refer Language Support to get the language IDs. If no language ID is passed, the language is auto-detected using AI.
MODEYesfast or advanced, depending on whether you want higher speed or higher accuracy. Currently, only advanced is supported.
NUMBER_FORMATTINGNodigits or words, depending on whether you want the output to have numbers formatted as digits or words. If this argument is not passed, it returns the transcript without any additional formatting.
More Configurations

Apart from the required configurations that have been passed in the example above, we support more optional configurations as well. Please refer to the API Reference for more details on how to pass them in the request.

For details and examples, check out:

When a request is sent via the curl command above, it returns the details of the job created including its jobId, and error message, if any. An example response is given below.

{
"success": true,
"message": "Job created successfully",
"data": {
"jobId": "281f8662-cdc3-4c76-82d0-e7d14af52c46"
}
}
Mode: Fast vs Advanced

This option enables you to select between having higher speed or higher accuracy.

  • Fast: Our fast model is optimized for fast turnaround time and runtime efficiency.
  • Advanced: Our advanced model is optimized for the best possible accuracy.
mode

Only advanced mode is supported right now. fast will be added in the next release in December.

Number formatting: Digits vs Words

Below example illustrates how the transcript is formatted when each option is chosen.

Raw TranscriptFormatted as DigitsFormatted as Words
I live on Street 23, and turned eighteen yesterday!I live on Street 23, and turned 18 yesterday!I live on Street twenty three, and turned eighteen yesterday!

Fetch Transcription Results

When you pass the jobId (received in response to the transcription API) to the API below, it fetches the status and results of the job.

curl --location 'https://voice.neuralspace.ai/api/v1/jobs/{{jobId}}' \
--header 'Authorization: {{API_KEY}}'

If the status reads Completed it means the job was successful. Failed means the job could not be completed due to some reason.

Response of the request above appears as follows:

{
"success": true,
"message": "Data fetched successfully",
"data": {
"timestamp": 1692769864642,
"filename": "santa-claus-a-reading-christmas-story-17777.mp3",
"jobId": "329ff79f-a540-4536-9ef6-95a538d4d597",
"filePath": "uploads/24444581-a379-4c18-9f0b-d93336b4dddb",
"params": {
"file_transcription": {
"language_id": "en",
"mode": "advanced"
}
},
"status": "Completed",
"audioDuration": 248.328,
"progress": [
"queued",
"Started",
"transcription Started",
"transcription Completed",
"Completed"
],
"result": {
"transcription": {
"save_path": "uploads/stt-329ff79f-a540-4536-9ef6-95a538d4d597.json",
"transcript": "Towards the night before Christmas, when all through the house, not a creature was stirring. Oh, not even a mouse! The stockings were hung by the chimney with care in hopes that St. Nicholas, oh, that's me, soon would be there. The children were nestled all snug in their beds. while visions of sugar plums danced in their heads, and Mama in her kerchief, I in my cap, had just settled down for a long winter snap ... and laying his finger aside at his nose and giving a nod, the chimney he rode, his sprang to his sleigh to his steam gave a whistle, and away they all flew like the down of a thistle. But I heard him exclaim as he drove out of sight, Merry Christmas to all and to all. Good night.",
"timestamps": [
{
"word": "Towards",
"start": 1.92,
"end": 2.2,
"conf": 1
},
{
"word": "the",
"start": 2.2,
"end": 2.42,
"conf": 1
},
{
"word": "night",
"start": 2.42,
"end": 2.66,
"conf": 1
},
{
"word": "before",
"start": 2.66,
"end": 3.02,
"conf": 0.69
},
...
]
}
}
}
}

In the results that are received as response from the API or SDK, along with the transcript, word-level timestamps are also returned by default. Timestamps are also returned if you download as JSON using the UI. This capability allows you to precisely pinpoint when each word was spoken, facilitating detailed analysis and understanding of spoken content. Refer to timestamps for more information.

Troubleshooting and FAQ

No transcript? Check out our FAQ page. If you still need help, feel free to reach out to us directly at support@neuralspace.ai or join our Slack community.