all AI news
FALL-E: A Foley Sound Synthesis Model and Strategies. (arXiv:2306.09807v2 [eess.AS] UPDATED)
cs.LG updates on arXiv.org arxiv.org
This paper introduces FALL-E, a foley synthesis system and its
training/inference strategies. The FALL-E model employs a cascaded approach
comprising low-resolution spectrogram generation, spectrogram super-resolution,
and a vocoder. We trained every sound-related model from scratch using our
extensive datasets, and utilized a pre-trained language model. We conditioned
the model with dataset-specific texts, enabling it to learn sound quality and
recording environment based on text input. Moreover, we leveraged external
language models to improve text descriptions of our datasets and performed …
arxiv datasets inference language language model low paper sound spectrogram strategies synthesis training