April 7, 2024, 7:17 p.m. | /u/rumble_ftw

Machine Learning www.reddit.com

I'm working on building a Jarvis-style conversational AI assistant that utilizes a large language model (LLM) behind the scenes. However, I want to make the experience as seamless and natural as possible by having the assistant start speaking as soon as the LLM starts generating its response, token by token.

To achieve this, I need a text-to-speech (TTS) model that can operate with extremely low latency and generate audio in a word-by-word or phoneme-by-phoneme fashion as the text stream comes …

ai assistant assistant building conversational conversational ai experience however jarvis language language model large language large language model latency llm low low latency machinelearning natural speaking style token tts word

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Data Engineer - Takealot Group (Takealot.com | Superbalist.com | Mr D Food)

@ takealot.com | Cape Town