Web: http://arxiv.org/abs/2201.03655

Jan. 12, 2022, 2:10 a.m. | Chhavi Choudhury, Ankur Gandhe, Xiaohan Ding, Ivan Bulyko

cs.LG updates on arXiv.org arxiv.org

End-to-end (E2E) automatic speech recognition models like Recurrent Neural
Networks Transducer (RNN-T) are becoming a popular choice for streaming ASR
applications like voice assistants. While E2E models are very effective at
learning representation of the training data they are trained on, their
accuracy on unseen domains remains a challenging problem. Additionally, these
models require paired audio and text training data, are computationally
expensive and are difficult to adapt towards the fast evolving nature of
conversational speech. In this work, we explore a contextual biasing approach
using likelihood-ratio that leverages text …

arxiv domain adaptation for

Statistics and Computer Science Specialist

@ Hawk-Research | Remote

Data Scientist, Credit/Fraud Strategy

@ Fora Financial | New York City

Postdoctoral Research Associate - Biomedical Natural Language Processing and Deep Learning

@ Oak Ridge National Laboratory - Oak Ridge, TN | Oak Ridge, TN, United States

Senior Machine Learning / Computer Vision Engineer

@ Glass Imaging | Los Altos, CA

Research Scientist in Biomedical Natural Language Processing and Deep Learning

@ Oak Ridge National Laboratory | Oak Ridge, TN

W3-Professorship for Intelligent Energy Management

@ Universität Bayreuth | Bayreuth, Germany