Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos | allainews.com

June 14, 2024, 4:48 a.m. | Changan Chen, Puyuan Peng, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwarth, Kristen Grauman

cs.CV updates on arXiv.org arxiv.org

arXiv:2406.09272v1 Announce Type: new
Abstract: Generating realistic audio for human interactions is important for many applications, such as creating sound effects for films or virtual reality games. Existing approaches implicitly assume total correspondence between the video and audio during training, yet many sounds happen off-screen and have weak to no correspondence with the visuals -- resulting in uncontrolled ambient sounds or hallucinations at test time. We propose a novel ambient-aware audio generation model, AV-LDM. We devise a novel audio-conditioning mechanism …

abstract action ambient applications arxiv audio cs.ai cs.cv cs.sd eess.as effects films games human human interactions important interactions off reality sound sound effects total training type video videos virtual virtual reality virtual reality games

More from arxiv.org / cs.CV updates on arXiv.org

InstantGroup: Instant Template Generation for Scalable Group of Brain MRI Registration 13 hours ago | arxiv.org

abstract arxiv brain costs +15

Visual Odometry with Neuromorphic Resonator Networks 13 hours ago | arxiv.org

abstract arxiv cs.ai cs.cv +15

CTNeRF: Cross-Time Transformer for Dynamic Neural Radiance Field from Monocular Video 13 hours ago | arxiv.org

arxiv cs.cv dynamic neural radiance field +4

InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models 13 hours ago | arxiv.org

arxiv cs.cv instruction-tuned language +6

Towards Training-free Open-world Segmentation via Image Prompt Foundation Models 13 hours ago | arxiv.org

abstract arxiv computer computer vision +33

Re-initialization-free Level Set Method via Molecular Beam Epitaxy Equation Regularization for Image Segmentation 13 hours ago | arxiv.org

abstract arxiv become continuity +15

ObjFormer: Learning Land-Cover Changes From Paired OSM Data and Optical High-Resolution Imagery via Object-Guided Transformer 13 hours ago | arxiv.org

arxiv cs.ai cs.cv cs.cy +9

Unsupervised Open-Vocabulary Object Localization in Videos 13 hours ago | arxiv.org

abstract advances arxiv attention +21

Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network 13 hours ago | arxiv.org

arxiv compensation cs.cv eess.iv +6

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

View on ai-jobs.net

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

PhD Student AI simulation electric drive (f/m/d)

@ Volkswagen Group | Kassel, DE, 34123

View on ai-jobs.net

AI Privacy Research Lead

@ Leidos | 6314 Remote/Teleworker US

View on ai-jobs.net

Senior Platform System Architect, Silicon

@ Google | New Taipei, Banqiao District, New Taipei City, Taiwan

View on ai-jobs.net

Fabrication Hardware Litho Engineer, Quantum AI

@ Google | Goleta, CA, USA

View on ai-jobs.net