XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception | allainews.com

March 22, 2024, 4:48 a.m. | HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, Changhan Wang

cs.CL updates on arXiv.org arxiv.org

arXiv:2403.14402v1 Announce Type: cross
Abstract: Speech recognition and translation systems perform poorly on noisy inputs, which are frequent in realistic environments. Augmenting these systems with visual signals has the potential to improve robustness to noise. However, audio-visual (AV) data is only available in limited amounts and for fewer languages than audio-only resources. To address this gap, we present XLAVS-R, a cross-lingual audio-visual speech representation model for noise-robust speech recognition and translation in over 100 languages. It is designed to maximize …

abstract arxiv audio cross-lingual cs.cl cs.sd data eess.as environments however inputs languages noise perception recognition representation representation learning robust robustness speech speech recognition systems translation type visual

More from arxiv.org / cs.CL updates on arXiv.org

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback an hour ago | arxiv.org

alignment arxiv cs.cl feedback +5

Can Language Model Moderators Improve the Health of Online Discourse? an hour ago | arxiv.org

abstract arxiv communities conversational +19

R-Tuning: Instructing Large Language Models to Say `I Don't Know' an hour ago | arxiv.org

arxiv cs.cl language language models +3

On-the-Fly Fusion of Large Language Models and Machine Translation an hour ago | arxiv.org

abstract arxiv cs.cl data +12

Can LLMs Grade Short-Answer Reading Comprehension Questions : An Empirical Study with a Novel Dataset an hour ago | arxiv.org

abstract arxiv assessment cs.ai +16

Making Retrieval-Augmented Language Models Robust to Irrelevant Context an hour ago | arxiv.org

abstract arxiv context cs.ai +14

RA-DIT: Retrieval-Augmented Dual Instruction Tuning an hour ago | arxiv.org

abstract arxiv build cs.ai +19

Bengali Fake Reviews: A Benchmark Dataset and Detection System an hour ago | arxiv.org

abstract arxiv benchmark businesses +16

How far is Language Model from 100% Few-shot Named Entity Recognition in Medical Domain an hour ago | arxiv.org

abstract arxiv capabilities cs.cl +14

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Sr. BI Analyst

@ AkzoNobel | Pune, IN

View on ai-jobs.net