April 7, 2024, 7:17 p.m. | /u/rumble_ftw

Machine Learning www.reddit.com

I'm working on building a Jarvis-style conversational AI assistant that utilizes a large language model (LLM) behind the scenes. However, I want to make the experience as seamless and natural as possible by having the assistant start speaking as soon as the LLM starts generating its response, token by token.

To achieve this, I need a text-to-speech (TTS) model that can operate with extremely low latency and generate audio in a word-by-word or phoneme-by-phoneme fashion as the text stream comes …

ai assistant assistant building conversational conversational ai experience however jarvis language language model large language large language model latency llm low low latency machinelearning natural speaking style token tts word

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York