all AI news
Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features. (arXiv:2211.00342v1 [cs.SD])
Nov. 2, 2022, 1:15 a.m. | Alexandra Vioni, Georgia Maniati, Nikolaos Ellinas, June Sig Sung, Inchul Hwang, Aimilios Chalamandaris, Pirros Tsiakoulis
cs.CL updates on arXiv.org arxiv.org
Current state-of-the-art methods for automatic synthetic speech evaluation
are based on MOS prediction neural models. Such MOS prediction models include
MOSNet and LDNet that use spectral features as input, and SSL-MOS that relies
on a pretrained self-supervised learning model that directly uses the speech
signal as input. In modern high-quality neural TTS systems, prosodic
appropriateness with regard to the spoken content is a decisive factor for
speech naturalness. For this reason, we propose to include prosodic and
linguistic features as …
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Lead Data Scientist, Commercial Analytics
@ Checkout.com | London, United Kingdom
Data Engineer I
@ Love's Travel Stops | Oklahoma City, OK, US, 73120