all AI news
Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module. (arXiv:2202.08164v1 [eess.AS])
Feb. 17, 2022, 8:11 a.m. | Adam Gabryś, Goeric Huybrechts, Manuel Sam Ribeiro, Chung-Ming Chien, Julian Roth, Giulia Comini, Roberto Barra-Chicote, Bartek Perz, Jaime Lore
cs.LG updates on arXiv.org arxiv.org
State-of-the-art text-to-speech (TTS) systems require several hours of
recorded speech data to generate high-quality synthetic speech. When using
reduced amounts of training data, standard TTS models suffer from speech
quality and intelligibility degradations, making training low-resource TTS
systems problematic. In this paper, we propose a novel extremely low-resource
TTS method called Voice Filter that uses as little as one minute of speech from
a target speaker. It uses voice conversion (VC) as a post-processing module
appended to a pre-existing high-quality …
arxiv conversion processing speech text text-to-speech voice
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Data Analyst
@ Aviva | UK - Norwich - Carrara - 1st Floor
Werkstudent im Bereich Performance Engineering mit Computer Vision (w/m/div.) - anteilig remote
@ Bosch Group | Stuttgart, Lollar, Germany
Applied Research Scientist - NLP (Senior)
@ Snorkel AI | Hybrid / San Francisco, CA
Associate Principal Engineer, Machine Learning
@ Nagarro | Remote, India