Real-time Transcription
In this article you will learn how to convert an audio stream coming live from a microphone to text using our APIs.
Pre-requisites
- Python packages
pip install pyaudio neuralspace
PyAudio depends on the PortAudio library. It needs to be installed via your OS package manager.
- For Mac OS X
brew install portaudio
- For Debian/Ubuntu Linux
apt install portaudio19-dev
Code Snippet
You can run the following code to start real-time streaming.
import json
import threading
from queue import Queue
import pyaudio
import neuralspace as ns
q = Queue()
# callback for pyaudio to fill up the queue
def listen(in_data, frame_count, time_info, status):
q.put(in_data)
return (None, pyaudio.paContinue)
# transfer from queue to websocket
def send_audio(q, ws):
while True:
data = q.get()
ws.send_binary(data)
# initialize VoiceAI
vai = ns.VoiceAI()
pa = pyaudio.PyAudio()
# open websocket connection
with vai.stream('en') as ws:
# start pyaudio stream
stream = pa.open(
rate=16000,
channels=1,
format=pyaudio.paInt16,
frames_per_buffer=4096,
input=True,
output=False,
stream_callback=listen,
)
# start sending audio bytes on a new thread
t = threading.Thread(target=send_audio, args=(q, ws))
t.start()
print('Listening...')
# start receiving results on the current thread
while True:
resp = ws.recv()
resp = json.loads(resp)
text = resp['text']
# optional output formatting; new lines on every 'full' utterance
if resp['full']:
print('\r' + ' ' * 120, end='', flush=True)
print(f'\r{text}', flush=True)
else:
if len(text) > 120:
text = f'...{text[-115:]}'
print(f'\r{text}', end='', flush=True)
Troubleshooting and FAQ
Mic problems? Check out our FAQ page. If you still need help, feel free to reach out to us directly at support@neuralspace.ai or join our Slack community.