Web: http://arxiv.org/abs/2205.05684

May 13, 2022, 1:10 a.m. | Otavio Braga, Olivier Siohan

cs.CV updates on arXiv.org arxiv.org

Audio-visual automatic speech recognition is a promising approach to robust
ASR under noisy conditions. However, up until recently it had been
traditionally studied in isolation assuming the video of a single speaking face
matches the audio, and selecting the active speaker at inference time when
multiple people are on screen was put aside as a separate problem. As an
alternative, recent work has proposed to address the two problems
simultaneously with an attention mechanism, baking the speaker selection
problem directly …

arxiv audio person speech speech recognition

More from arxiv.org / cs.CV updates on arXiv.org

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL

Assistant/Associate Professor of the Practice in Business Analytics

@ Georgetown University McDonough School of Business | Washington DC

Senior Data Science Writer

@ NannyML | Remote

Director of AI/ML Engineering

@ Armis Industries | Remote (US only), St. Louis, California

Digital Analytics Manager

@ Patagonia | Ventura, California