Getting More for Less: Using Weak Labels and AV-Mixup for Robust Audio-Visual Speaker Verification | allainews.com

June 14, 2024, 4:47 a.m. | Anith Selvakumar, Homa Fashandi

cs.LG updates on arXiv.org arxiv.org

arXiv:2309.07115v2 Announce Type: replace-cross
Abstract: Distance Metric Learning (DML) has typically dominated the audio-visual speaker verification problem space, owing to strong performance in new and unseen classes. In our work, we explored multitask learning techniques to further enhance DML, and show that an auxiliary task with even weak labels can increase the quality of the learned speaker representation without increasing model complexity during inference. We also extend the Generalized End-to-End Loss (GE2E) to multimodal inputs and demonstrate that it can …

abstract arxiv audio cs.cv cs.lg cs.mm cs.sd dml eess.as labels multitask learning performance problem replace robust show space speaker type verification visual work

More from arxiv.org / cs.LG updates on arXiv.org

ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models 7 hours ago | arxiv.org

arxiv context cs.ai cs.cv +9

Eloquent: A More Robust Transmission Scheme for LLM Token Streaming 7 hours ago | arxiv.org

abstract arxiv cs.lg cs.ni +17

Synthetic Data Generation Framework, Dataset, and Efficient Deep Model for Pedestrian Intention Prediction 7 hours ago | arxiv.org

abstract arxiv autonomous autonomous driving +19

Stockformer: A Price-Volume Factor Stock Selection Model Based on Wavelet Transform and Multi-Task Self-Attention Networks 7 hours ago | arxiv.org

arxiv attention cs.lg networks +7

SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Controllable Adversaries 7 hours ago | arxiv.org

arxiv cs.ai cs.cv cs.lg +10

Quantum Algorithms for the Pathwise Lasso 7 hours ago | arxiv.org

abstract algorithm algorithms arxiv +17

RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation 7 hours ago | arxiv.org

abstract agent arxiv automated +20

Transformers are Provably Optimal In-context Estimators for Wireless Communications 7 hours ago | arxiv.org

abstract arxiv canonical capability +19

Modeling groundwater levels in California's Central Valley by hierarchical Gaussian process and neural network regression 7 hours ago | arxiv.org

abstract arxiv california consistent +15

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

Senior Algorithms Engineer (Image Processing)

@ KLA | USA-MI-Ann Arbor-KLA

View on ai-jobs.net

Principal Software Development Engineer

@ Yahoo | US - United States of America

View on ai-jobs.net

Data Domain Architect, Vice President

@ JPMorgan Chase & Co. | Columbus, OH, United States

View on ai-jobs.net

Senior, Data Scientist, Sam's Personalization

@ Cox Enterprises | (USA) TX MCKINNEY 04906 SAM'S CLUB

View on ai-jobs.net

Software Engineering Specialist

@ GE HealthCare | Bengaluru HEALTHCARE (JFWTC) IN

View on ai-jobs.net