all AI news
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
April 5, 2024, 4:42 a.m. | Detai Xin, Xu Tan, Kai Shen, Zeqian Ju, Dongchao Yang, Yuancheng Wang, Shinnosuke Takamichi, Hiroshi Saruwatari, Shujie Liu, Jinyu Li, Sheng Zhao
cs.LG updates on arXiv.org arxiv.org
Abstract: We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as unstable prosody (weird pitch and rhythm/duration) and a high word error rate (WER), due to the autoregressive prediction style of language models. The core idea behind RALL-E is chain-of-thought (CoT) prompting, which decomposes the task into simpler steps to …
abstract arxiv codec cs.ai cs.cl cs.lg cs.sd eess.as language language models large language large language models llms modeling performance pitch prompting robust robustness shows speech synthesis text text-to-speech thought tts type work zero-shot
More from arxiv.org / cs.LG updates on arXiv.org
Efficient Data-Driven MPC for Demand Response of Commercial Buildings
2 days, 22 hours ago |
arxiv.org
Testing the Segment Anything Model on radiology data
2 days, 22 hours ago |
arxiv.org
Calorimeter shower superresolution
2 days, 22 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US