all AI news
Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody. (arXiv:2206.14643v1 [eess.AS])
June 30, 2022, 1:12 a.m. | Peter Makarov, Ammar Abbas, Mateusz Łajszczak, Arnaud Joly, Sri Karlapati, Alexis Moinet, Thomas Drugman, Penny Karanasou
cs.CL updates on arXiv.org arxiv.org
Generating expressive and contextually appropriate prosody remains a
challenge for modern text-to-speech (TTS) systems. This is particularly evident
for long, multi-sentence inputs. In this paper, we examine simple extensions to
a Transformer-based FastSpeech-like system, with the goal of improving prosody
for multi-sentence TTS. We find that long context, powerful text features, and
training on multi-speaker data all improve prosody. More interestingly, they
result in synergies. Long context disambiguates prosody, improves coherence,
and plays to the strengths of Transformers. Fine-tuning word-level …
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Data Scientist (m/f/x/d)
@ Symanto Research GmbH & Co. KG | Spain, Germany
AI Scientist/Engineer
@ OKX | Singapore
Research Engineering/ Scientist Associate I
@ The University of Texas at Austin | AUSTIN, TX
Senior Data Engineer
@ Algolia | London, England
Fundamental Equities - Vice President, Equity Quant Research Analyst (Income & Value Investment Team)
@ BlackRock | NY7 - 50 Hudson Yards, New York
Snowflake Data Analytics
@ Devoteam | Madrid, Spain