Nov. 2, 2022, 1:11 a.m. | Alexandra Vioni, Georgia Maniati, Nikolaos Ellinas, June Sig Sung, Inchul Hwang, Aimilios Chalamandaris, Pirros Tsiakoulis

cs.LG updates on arXiv.org arxiv.org

Current state-of-the-art methods for automatic synthetic speech evaluation
are based on MOS prediction neural models. Such MOS prediction models include
MOSNet and LDNet that use spectral features as input, and SSL-MOS that relies
on a pretrained self-supervised learning model that directly uses the speech
signal as input. In modern high-quality neural TTS systems, prosodic
appropriateness with regard to the spoken content is a decisive factor for
speech naturalness. For this reason, we propose to include prosodic and
linguistic features as …

arxiv features mos prediction speech text text-to-speech

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Data Engineer (m/f/d)

@ Project A Ventures | Berlin, Germany

Principle Research Scientist

@ Analog Devices | US, MA, Boston