all AI news
Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation. (arXiv:2206.12772v2 [cs.CV] UPDATED)
Aug. 16, 2022, 1:13 a.m. | Jinxiang Liu, Chen Ju, Weidi Xie, Ya Zhang
cs.CV updates on arXiv.org arxiv.org
We present a simple yet effective self-supervised framework for audio-visual
representation learning, to localize the sound source in videos. To understand
what enables to learn useful representations, we systematically investigate the
effects of data augmentations, and reveal that (1) composition of data
augmentations plays a critical role, i.e. explicitly encouraging the
audio-visual representations to be invariant to various transformations~({\em
transformation invariance}); (2) enforcing geometric consistency substantially
improves the quality of learned representations, i.e. the detected sound source
should follow the …
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Social Insights & Data Analyst (Freelance)
@ Media.Monks | Jakarta
Cloud Data Engineer
@ Arkatechture | Portland, ME, USA