Skip to main content


Real-time Latency

On a T4 machine, this is what the latency for real-time transcription would look like:

Real-time Transcription
Partial results140ms + network latency
Full results450ms + network latency

File Transcription Processing Time

For file transcription, the real-time factor (RTF) or time taken, with all the configurations enabled is:

File Transcription
GPU7x-10x RTF

Real-time factor (RTF) refers to the length of a file with respect to the amount of time it takes to transcribe it.

For example, a 1x RTF would mean a 10 minute audio file would take 10 minutes to process while a 10x RTF would mean it would only take 1 minute to process.

Troubleshooting and FAQ

For accuracy benchmarking, view our latest results on the NeuralSpace blog. Want more detailed benchmarks on different machines? Feel free to reach out to us directly at or join our Slack community.