Oct. 13, 2022, 1:18 a.m. | Rongjie Huang, Yi Ren, Jinglin Liu, Chenye Cui, Zhou Zhao

cs.CL updates on arXiv.org arxiv.org

Style transfer for out-of-domain (OOD) speech synthesis aims to generate
speech samples with unseen style (e.g., speaker identity, emotion, and prosody)
derived from an acoustic reference, while facing the following challenges: 1)
The highly dynamic style features in expressive voice are difficult to model
and transfer; and 2) the TTS models should be robust enough to handle diverse
OOD conditions that differ from the source data. This paper proposes
GenerSpeech, a text-to-speech model towards high-fidelity zero-shot style
transfer of OOD …

arxiv speech style transfer text text-to-speech transfer

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote