June 14, 2024, 4:48 a.m. | Changan Chen, Puyuan Peng, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwarth, Kristen Grauman

cs.CV updates on arXiv.org arxiv.org

arXiv:2406.09272v1 Announce Type: new
Abstract: Generating realistic audio for human interactions is important for many applications, such as creating sound effects for films or virtual reality games. Existing approaches implicitly assume total correspondence between the video and audio during training, yet many sounds happen off-screen and have weak to no correspondence with the visuals -- resulting in uncontrolled ambient sounds or hallucinations at test time. We propose a novel ambient-aware audio generation model, AV-LDM. We devise a novel audio-conditioning mechanism …

abstract action ambient applications arxiv audio cs.ai cs.cv cs.sd eess.as effects films games human human interactions important interactions off reality sound sound effects total training type video videos virtual virtual reality virtual reality games

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

Senior Data Engineer

@ Displate | Warsaw

PhD Student AI simulation electric drive (f/m/d)

@ Volkswagen Group | Kassel, DE, 34123

AI Privacy Research Lead

@ Leidos | 6314 Remote/Teleworker US

Senior Platform System Architect, Silicon

@ Google | New Taipei, Banqiao District, New Taipei City, Taiwan

Fabrication Hardware Litho Engineer, Quantum AI

@ Google | Goleta, CA, USA