Aug. 11, 2023, 6:45 a.m. | Minsung Kang, Sangshin Oh, Hyeongi Moon, Kyungyun Lee, Ben Sangbae Chon

cs.LG updates on arXiv.org arxiv.org

This paper introduces FALL-E, a foley synthesis system and its
training/inference strategies. The FALL-E model employs a cascaded approach
comprising low-resolution spectrogram generation, spectrogram super-resolution,
and a vocoder. We trained every sound-related model from scratch using our
extensive datasets, and utilized a pre-trained language model. We conditioned
the model with dataset-specific texts, enabling it to learn sound quality and
recording environment based on text input. Moreover, we leveraged external
language models to improve text descriptions of our datasets and performed …

arxiv datasets inference language language model low paper sound spectrogram strategies synthesis training

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States