April 30, 2024, 4:50 a.m. | Xiang Li, Zhi-Qi Cheng, Jun-Yan He, Xiaojiang Peng, Alexander G. Hauptmann

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.18398v1 Announce Type: new
Abstract: Emotional Text-to-Speech (E-TTS) synthesis has gained significant attention in recent years due to its potential to enhance human-computer interaction. However, current E-TTS approaches often struggle to capture the complexity of human emotions, primarily relying on oversimplified emotional labels or single-modality inputs. To address these limitations, we propose the Multimodal Emotional Text-to-Speech System (MM-TTS), a unified framework that leverages emotional cues from multiple modalities to generate highly expressive and emotionally resonant speech. MM-TTS consists of two …

arxiv cs.cl cs.mm framework multimodal prompt speech synthesis text text-to-speech tts type

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

Sr. Data Operations

@ Carousell Group | West Jakarta, Indonesia

Senior Analyst, Business Intelligence & Reporting

@ Deutsche Bank | Bucharest

Business Intelligence Subject Matter Expert (SME) - Assistant Vice President

@ Deutsche Bank | Cary, 3000 CentreGreen Way

Enterprise Business Intelligence Specialist

@ NAIC | Kansas City

Senior Business Intelligence (BI) Developer - Associate

@ Deutsche Bank | Cary, 3000 CentreGreen Way