all AI news
End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue. (arXiv:2206.12040v1 [eess.AS])
June 27, 2022, 1:11 a.m. | Kentaro Mitsui, Tianyu Zhao, Kei Sawada, Yukiya Hono, Yoshihiko Nankaku, Keiichi Tokuda
cs.CL updates on arXiv.org arxiv.org
The recent text-to-speech (TTS) has achieved quality comparable to that of
humans; however, its application in spoken dialogue has not been widely
studied. This study aims to realize a TTS that closely resembles human
dialogue. First, we record and transcribe actual spontaneous dialogues. Then,
the proposed dialogue TTS is trained in two stages: first stage, variational
autoencoder (VAE)-VITS or Gaussian mixture variational autoencoder (GMVAE)-VITS
is trained, which introduces an utterance-level latent variable into
variational inference with adversarial learning for end-to-end …
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Data Strategy & Management - Private Equity Sector - Manager - Consulting - Location OPEN
@ EY | New York City, US, 10001-8604
Data Engineer- People Analytics
@ Volvo Group | Gothenburg, SE, 40531