all AI news
On the Semantic Latent Space of Diffusion-Based Text-to-Speech Models
Feb. 21, 2024, 5:42 a.m. | Miri Varshavsky Hassid, Roy Hirsch, Regev Cohen, Tomer Golany, Daniel Freedman, Ehud Rivlin
cs.LG updates on arXiv.org arxiv.org
Abstract: The incorporation of Denoising Diffusion Models (DDMs) in the Text-to-Speech (TTS) domain is rising, providing great value in synthesizing high quality speech. Although they exhibit impressive audio quality, the extent of their semantic capabilities is unknown, and controlling their synthesized speech's vocal properties remains a challenge. Inspired by recent advances in image synthesis, we explore the latent space of frozen TTS models, which is composed of the latent bottleneck activations of the DDM's denoiser. We …
arxiv cs.cl cs.lg cs.sd diffusion eess.as semantic space speech text text-to-speech type
More from arxiv.org / cs.LG updates on arXiv.org
The Perception-Robustness Tradeoff in Deterministic Image Restoration
1 day, 10 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne