Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos | allainews.com

April 16, 2024, 4:48 a.m. | Sagnik Majumder, Ziad Al-Halah, Kristen Grauman

cs.CV updates on arXiv.org arxiv.org

arXiv:2307.04760v3 Announce Type: replace
Abstract: We propose a self-supervised method for learning representations based on spatial audio-visual correspondences in egocentric videos. Our method uses a masked auto-encoding framework to synthesize masked binaural (multi-channel) audio through the synergy of audio and vision, thereby learning useful spatial relationships between the two modalities. We use our pretrained features to tackle two downstream video tasks requiring spatial understanding in social scenarios: active speaker detection and spatial audio denoising. Through extensive experiments, we show that …

arxiv audio cs.cv cs.sd eess.as features spatial type videos visual

More from arxiv.org / cs.CV updates on arXiv.org

A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and … 20 hours ago | arxiv.org

abstract arxiv beyond cs.cv +16

Enhancing Super-Resolution Networks through Realistic Thick-Slice CT Simulation 20 hours ago | arxiv.org

abstract acquisition arxiv cs.ai +20

TransRUPNet for Improved Polyp Segmentation 20 hours ago | arxiv.org

arxiv cs.cv eess.iv segmentation +1

An interpretable machine learning system for colorectal cancer diagnosis from pathology slides 20 hours ago | arxiv.org

abstract artificial artificial intelligence arxiv +19

Attention is All They Need: Exploring the Media Archaeology of the Computer Vision Research Paper 20 hours ago | arxiv.org

abstract archaeology arxiv attention +22

Refining Remote Photoplethysmography Architectures using CKA and Empirical Methods 20 hours ago | arxiv.org

abstract architecture architectures arxiv +8

Learning to Complement with Multiple Humans 20 hours ago | arxiv.org

abstract adoption arxiv assumptions +12

HiH: A Multi-modal Hierarchy in Hierarchy Network for Unconstrained Gait Recognition 20 hours ago | arxiv.org

abstract advances arxiv challenges +12

Image-Based Virtual Try-On: A Survey 20 hours ago | arxiv.org

arxiv cs.cv image survey +3

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Business Data Analyst

@ Alstom | Johannesburg, GT, ZA

View on ai-jobs.net