Skip to main content

Real-time Transcription

In this article you will learn how to convert an audio stream coming live from a microphone to text using our APIs.

Pre-requisites

  • Python packages
    pip install pyaudio neuralspace

PyAudio depends on the PortAudio library. It needs to be installed via your OS package manager.

  • For Mac OS X
    brew install portaudio
  • For Debian/Ubuntu Linux
    apt install portaudio19-dev

Code Snippet

You can run the following code to start real-time streaming.

import json
import threading
from queue import Queue

import pyaudio
import neuralspace as ns

q = Queue()

# callback for pyaudio to fill up the queue
def listen(in_data, frame_count, time_info, status):
q.put(in_data)
return (None, pyaudio.paContinue)

# transfer from queue to websocket
def send_audio(q, ws):
while True:
data = q.get()
ws.send_binary(data)

# initialize VoiceAI
vai = ns.VoiceAI()
pa = pyaudio.PyAudio()

# open websocket connection
with vai.stream('en') as ws:
# start pyaudio stream
stream = pa.open(
rate=16000,
channels=1,
format=pyaudio.paInt16,
frames_per_buffer=4096,
input=True,
output=False,
stream_callback=listen,
)
# start sending audio bytes on a new thread
t = threading.Thread(target=send_audio, args=(q, ws))
t.start()
print('Listening...')
# start receiving results on the current thread
while True:
resp = ws.recv()
resp = json.loads(resp)
text = resp['text']
# optional output formatting; new lines on every 'full' utterance
if resp['full']:
print('\r' + ' ' * 120, end='', flush=True)
print(f'\r{text}', flush=True)
else:
if len(text) > 120:
text = f'...{text[-115:]}'
print(f'\r{text}', end='', flush=True)

Troubleshooting and FAQ

Mic problems? Check out our FAQ page. If you still need help, feel free to reach out to us directly at support@neuralspace.ai or join our Slack community.