all AI news
Robust Self-Supervised Audio-Visual Speech Recognition. (arXiv:2201.01763v1 [cs.SD])
Jan. 6, 2022, 2:10 a.m. | Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed
cs.LG updates on arXiv.org arxiv.org
Audio-based automatic speech recognition (ASR) degrades significantly in
noisy environments and is particularly vulnerable to interfering speech, as the
model cannot determine which speaker to transcribe. Audio-visual speech
recognition (AVSR) systems improve robustness by complementing the audio stream
with the visual information that is invariant to noise and helps the model
focus on the desired speaker. However, previous AVSR work focused solely on the
supervised learning setup; hence the progress was hindered by the amount of
labeled data available. In …
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Senior ML Researcher - 3D Geometry Processing | 3D Shape Generation | 3D Mesh Data
@ Promaton | Europe
Data Scientist
@ Motive | India - Remote
Senior Perception Engineer
@ NVIDIA | US, CA, Santa Clara
Business Data Analyst, Finance and Treasury Data Repositories, Senior Associate
@ State Street | Krakow, Poland
Junior AI Engineer (Internship)
@ Sony | SEU - Italy - Roma
Manager, Data Science 3
@ PayPal | USA - Pennsylvania - Virtual