all AI news
Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features. (arXiv:2211.00342v1 [cs.SD])
Nov. 2, 2022, 1:11 a.m. | Alexandra Vioni, Georgia Maniati, Nikolaos Ellinas, June Sig Sung, Inchul Hwang, Aimilios Chalamandaris, Pirros Tsiakoulis
cs.LG updates on arXiv.org arxiv.org
Current state-of-the-art methods for automatic synthetic speech evaluation
are based on MOS prediction neural models. Such MOS prediction models include
MOSNet and LDNet that use spectral features as input, and SSL-MOS that relies
on a pretrained self-supervised learning model that directly uses the speech
signal as input. In modern high-quality neural TTS systems, prosodic
appropriateness with regard to the spoken content is a decisive factor for
speech naturalness. For this reason, we propose to include prosodic and
linguistic features as …
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Data Engineer (m/f/d)
@ Project A Ventures | Berlin, Germany
Principle Research Scientist
@ Analog Devices | US, MA, Boston