Text to Speech
Text-to-Speech (TTS) technology transforms written text into spoken words. Our TTS service provides a realistic, human-like voice output, making it an essential tool not only for personal accessibility but also for enhancing user experience in various digital platforms.
It is pivotal in creating digital avatars and voice bots that offer more engaging and interactive user experiences. In the realm of customer service, TTS powers bots for call centers, enabling efficient and human-like interactions for queries and support. It's also instrumental in developing voice assistants and smart home devices, facilitating convenient voice commands and information retrieval. Additionally, TTS is used in content creation for voiceovers and narration, providing a cost-effective and efficient solution for producing diverse audio content.
Prerequisites
Make sure to follow Get Started to sign up and have all the necessary pre-requisites.
If you are using the APIs or the SDK, save your API key in a variable called NS_API_KEY
before moving ahead. Do this by using the command below
export NS_API_KEY=YOUR_API_KEY
Refer to the Voices page for speaker IDs of available voices in different languages.
Start a Speech Synthesis Job
- API
- Python SDK
Copy and paste the below mentioned curl requeston your terminal to generate audio from your provided text using the API. Fill the variables with the appropriate values.
curl --location 'https://voice.neuralspace.ai/api/v2/tts' \
--header "Authorization: $NS_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"text": "مرحبا بالعالم",
"speaker_id": "ar-male-Omar-saudi-neutral"
}'
Make sure you have the neuralspace
package installed in your python environment. Follow these steps to install the package using pip
and set the environment variable for your API key.
pip install neuralspace
export NS_API_KEY=YOUR_API_KEY
Once you have the package and API key set up, execute the following python code snippet to create a TTS job.
import neuralspace as ns
vai = ns.VoiceAI()
# print(vai.)
# or,
# vai = ns.VoiceAI(api_key='YOUR_API_KEY')
# TTS job configuration
data = {
"text": "كيف حالك",
"speaker_id": "ar-female-Nadia-saudi-neutral"
}
# Generate audio from the provided text
result = vai.synthesize(data=data)
print(f'Output:\n{result}')
Data Parameters | Required | Description |
---|---|---|
text | Yes | Your text that you want to synthesize into speech. |
speaker_id | Yes | Speaker ID (can be obtained from voices) |
Apart from the required configurations that have been passed in the example above, we support more optional configurations as well as mentioned below. This can be passed inside the data dictionary. Please refer to the API Reference for more details on how to pass them in the request.
"data": {
...
"stream": True,
"config": {
"pace": 1.0,
"volume": 1.0,
"pitch_shift": 0,
"pitch_scale": 1.0
},
"sample_rate": 16000
}
- stream: Enable streaming to directly get the audio generated as bytes instead of a file download link.
- config: Control the pace, volume, and pitch of the generated audio, through their respective parameters.
- sample_rate: output sample rate to specify, currently 8000, 16000, 22050, 24000 are supported.
When a request is sent via the curl command or SDK code snippet above, it returns the generated audio as a file download link along with other details of the job, and error message, if any. An example response is given below. This is when stream
is set to False
. When it is True
, only a byte array is returned.
{
"success": true,
"message": "Job created successfully",
"data": {
"jobId": "b2d4bcb2-f7a6-453d-84eb-00f796f23880",
"timestamp": 1701765394885,
"result": {
"save_path": "https://largefilestoreprod.blob.core.windows.net/common/uploads/6272df27-81a6-442a-bb7a-f98b63243604"
}
}
}