March 12, 2024, 4:52 a.m. | Roi Benita, Michael Elad, Joseph Keshet

cs.CL updates on arXiv.org arxiv.org

arXiv:2310.01381v3 Announce Type: replace-cross
Abstract: Diffusion models have recently been shown to be relevant for high-quality speech generation. Most work has been focused on generating spectrograms, and as such, they further require a subsequent model to convert the spectrogram to a waveform (i.e., a vocoder). This work proposes a diffusion probabilistic end-to-end model for generating a raw speech waveform. The proposed model is autoregressive, generating overlapping frames sequentially, where each frame is conditioned on a portion of the previously generated …

abstract arxiv autoregressive model cs.cl cs.sd denoising diffusion diffusion models eess.as quality raw spectrogram speech speech generation type work

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Engineer

@ Cint | Gurgaon, India

Data Science (M/F), setor automóvel - Aveiro

@ Segula Technologies | Aveiro, Portugal