Feb. 29, 2024, 12:09 a.m. | /u/SaladChefs

Machine Learning www.reddit.com

# Speech-to-text benchmark with Parakeet TDT 1.1B

Our previous Speech-to-text benchmarks on [Whisper Large V3 benchmark](https://www.reddit.com/r/MachineLearning/comments/1ar08br/p_whisper_large_v3_benchmark_1_million_hours/) (11,736 mins/$) and [Whisper Large V2 benchmark](https://www.reddit.com/r/MachineLearning/comments/16ftd9v/p_whisper_large_benchmark_137_days_of_audio/) (1681 mins/$) generated a healthy discussion here.

Next on our list of open-source STT models is **Parakeet TDT 1.1B** which turned out to the winner.

In this benchmark, we transcribed **17,305 hours** of CommonVoice (en) audio to text from 5,209,130
audio files.

# Benchmark results:

Parakeet TDT 1.1B on a **RTX 3070 Ti** delivered **47,638 minutes per …

audio benchmark cost files list machinelearning managed managed services next per services speech speech-to-text text

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote