Feb. 9, 2024, 5:47 a.m. | Heeseung Kim Soonshin Seo Kyeongseok Jeong Ohsung Kwon Jungwhan Kim Jaehong Lee Eunwoo Song My

cs.CL updates on arXiv.org arxiv.org

While recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech, an LLM-based strategy for modeling spoken dialogs remains elusive and calls for further investigation. This work proposes an extensive speech-text LLM framework, named the Unified Spoken Dialog Model (USDM), to generate coherent spoken responses with organic prosodic features relevant to the given input speech without relying on automatic speech recognition (ASR) or text-to-speech (TTS) solutions. Our approach employs a …

cs.cl cs.sd eess.as

