all AI news
Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech. (arXiv:2206.02147v1 [eess.AS])
cs.CL updates on arXiv.org arxiv.org
Polyphone disambiguation aims to capture accurate pronunciation knowledge
from natural text sequences for reliable Text-to-speech (TTS) systems. However,
previous approaches require substantial annotated training data and additional
efforts from language experts, making it difficult to extend high-quality
neural TTS systems to out-of-domain daily conversations and countless languages
worldwide. This paper tackles the polyphone disambiguation problem from a
concise and novel perspective: we propose Dict-TTS, a semantic-aware generative
text-to-speech model with an online website dictionary (the existing prior
information in the …
arxiv dictionary knowledge learning prior speech text text-to-speech tts