Feb. 9, 2024, 5:47 a.m. | Heeseung Kim Soonshin Seo Kyeongseok Jeong Ohsung Kwon Jungwhan Kim Jaehong Lee Eunwoo Song My

cs.CL updates on arXiv.org arxiv.org

While recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech, an LLM-based strategy for modeling spoken dialogs remains elusive and calls for further investigation. This work proposes an extensive speech-text LLM framework, named the Unified Spoken Dialog Model (USDM), to generate coherent spoken responses with organic prosodic features relevant to the given input speech without relying on automatic speech recognition (ASR) or text-to-speech (TTS) solutions. Our approach employs a …

cs.cl cs.sd eess.as

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

Business Intelligence Analyst Lead

@ Zillow | Mexico City

Lead Data Engineer

@ Bristol Myers Squibb | Hyderabad

Big Data Solutions Architect

@ Databricks | Munich, Germany

Senior Data Scientist - Trendyol Seller

@ Trendyol | Istanbul (All)