all AI news
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech. (arXiv:2207.06389v1 [eess.AS])
cs.LG updates on arXiv.org arxiv.org
Denoising diffusion probabilistic models (DDPMs) have recently achieved
leading performances in many generative tasks. However, the inherited iterative
sampling process costs hinder their applications to text-to-speech deployment.
Through the preliminary study on diffusion model parameterization, we find that
previous gradient-based TTS models require hundreds or thousands of iterations
to guarantee high sample quality, which poses a challenge for accelerating
sampling. In this work, we propose ProDiff, on progressive fast diffusion model
for high-quality text-to-speech. Unlike previous work estimating the gradient …
arxiv diffusion diffusion model quality speech text text-to-speech