Web: http://arxiv.org/abs/2201.10375

Jan. 26, 2022, 2:11 a.m. | Artem Gorodetskii, Ivan Ozhiganov

cs.LG updates on arXiv.org arxiv.org

With recent advancements in voice cloning, the performance of speech
synthesis for a target speaker has been rendered similar to the human level.
However, autoregressive voice cloning systems still suffer from text alignment
failures, resulting in an inability to synthesize long sentences. In this work,
we propose a variant of attention-based text-to-speech system that can
reproduce a target voice from a few seconds of reference speech and generalize
to very long utterances as well. The proposed system is based on …

arxiv attention voice

