Feb. 6, 2024, 5:44 a.m. | Teysir BaouebIP Paris, LTCI, IDS, S2A Haocheng LiuIP Paris, LTCI, IDS, S2A Mathieu FontaineIP Paris, LTCI, IDS, S2A Jonathan L

cs.LG updates on arXiv.org arxiv.org

Generative adversarial network (GAN) models can synthesize highquality audio signals while ensuring fast sample generation. However, they are difficult to train and are prone to several issues including mode collapse and divergence. In this paper, we introduce SpecDiff-GAN, a neural vocoder based on HiFi-GAN, which was initially devised for speech synthesis from mel spectrogram. In our model, the training stability is enhanced by means of a forward diffusion process which consists in injecting noise from a Gaussian distribution to both …

adversarial audio cs.lg cs.sd diffusion divergence eess.as eess.sp gan generative generative adversarial network music music synthesis network neural vocoder noise paper sample speech synthesis train

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

AI Engineering Manager

@ M47 Labs | Barcelona, Catalunya [Cataluña], Spain