[P] Speech-to-Text Benchmark: 47,638 mins transcribed per $1 on RTX3070 Ti (1000-fold cost reduction than managed services) | allainews.com

Feb. 29, 2024, 12:09 a.m. | /u/SaladChefs

Machine Learning www.reddit.com

# Speech-to-text benchmark with Parakeet TDT 1.1B

Our previous Speech-to-text benchmarks on [Whisper Large V3 benchmark](https://www.reddit.com/r/MachineLearning/comments/1ar08br/p_whisper_large_v3_benchmark_1_million_hours/) (11,736 mins/$) and [Whisper Large V2 benchmark](https://www.reddit.com/r/MachineLearning/comments/16ftd9v/p_whisper_large_benchmark_137_days_of_audio/) (1681 mins/$) generated a healthy discussion here.

Next on our list of open-source STT models is **Parakeet TDT 1.1B** which turned out to the winner.

In this benchmark, we transcribed **17,305 hours** of CommonVoice (en) audio to text from 5,209,130
audio files.

# Benchmark results:

Parakeet TDT 1.1B on a **RTX 3070 Ti** delivered **47,638 minutes per …

audio benchmark cost files list machinelearning managed managed services next per services speech speech-to-text text

More from www.reddit.com / Machine Learning

[Research] xLSTM: Extended Long Short-Term Memory 5 hours ago | www.reddit.com

abstract contributed deep learning error +16

Non Technical ML Podcasts? [D] 13 hours ago | www.reddit.com

challenge context current data +16

[D] PEFT techniques actually used in the industry 16 hours ago | www.reddit.com

industry machinelearning normally peft +2

[D] Can anyone with the expertise speak to the overlap, or not, between Nvidia's hardware … 17 hours ago | www.reddit.com

apple chips expertise hardware +4

[P] Skyrim - Open-source model zoo for Large Weather Models 19 hours ago | www.reddit.com

ai models building capabilities fine-tuning +7

[P] Identify toxic underwater air bubbles lurking in the substrate with aquatic ultrasonic scans via … 21 hours ago | www.reddit.com

arduino classification color identify +11

[P] YARI - Yet Another RAG Implementation. Hybrid context retrieval 22 hours ago | www.reddit.com

api context cosine embedding +14

[D] Recognizing uncommon terms with whisper 1 day, 1 hour ago | www.reddit.com

audio file french hello +9

[D] Is EOS token crucial during pre-training? 1 day, 2 hours ago | www.reddit.com

documents eos flow information +7

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net