Dec. 20, 2023, 4:22 p.m. | /u/sanchitgandhi99

Machine Learning www.reddit.com

Speculative decoding gives 2x faster Whisper inference while ensuring exactly the same outputs, making it the perfect drop-in replacement for existing Whisper pipelines ⚡️

https://preview.redd.it/nq1xnd517h7c1.png?width=2604&format=png&auto=webp&s=5e2b9f662f5bbb7fb0183e0f822fe2bb86330e16

Check out the [blog post](https://huggingface.co/blog/whisper-speculative-decoding) and accompanying Google Colab, or continue reading for details👇

## How does it work? 🧐

Speculative decoding uses a smaller, faster model to assist the generation of a slower, larger one 🤝 By auto-regressively generating with the smaller model, and only performing validation forward passes with the larger one, the …

auto decoding faster inference machinelearning making pipelines replacement whisper work

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York