Web: http://arxiv.org/abs/2205.05586

May 12, 2022, 1:11 a.m. | Otavio Braga, Takaki Makino, Olivier Siohan, Hank Liao

cs.LG updates on arXiv.org arxiv.org

Traditionally, audio-visual automatic speech recognition has been studied
under the assumption that the speaking face on the visual signal is the face
matching the audio. However, in a more realistic setting, when multiple faces
are potentially on screen one needs to decide which face to feed to the A/V ASR
system. The present work takes the recent progress of A/V ASR one step further
and considers the scenario where multiple people are simultaneously on screen
(multi-person A/V ASR). We propose …

arxiv audio automatic speech recognition person speech speech recognition

More from arxiv.org / cs.LG updates on arXiv.org

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL

Assistant/Associate Professor of the Practice in Business Analytics

@ Georgetown University McDonough School of Business | Washington DC

Senior Data Science Writer

@ NannyML | Remote

Director of AI/ML Engineering

@ Armis Industries | Remote (US only), St. Louis, California

Digital Analytics Manager

@ Patagonia | Ventura, California