April 23, 2024, 6:45 p.m. | Youness Mansar

Towards Data Science - Medium towardsdatascience.com

Augmenting LLM Apps with a Voice Modality

Photo by Ian Harber on Unsplash

Many LLMs, particularly those that are open-source, have typically been limited to processing text or, occasionally, text with images (Large Multimodal Models or LMMs). But what if you want to communicate with your LLM using your voice? Thanks to the advancement of powerful speech-to-text open-source technologies in recent years, this becomes achievable.

We will go into the integration of Llama 3 with a speech-to-text model, all within …

javascript llama 3 llm nicegui speech recognition

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US