April 6, 2022, 1:10 a.m. | Minsu Kim, Jeong Hun Yeo, Yong Man Ro

cs.CV updates on arXiv.org arxiv.org

Recognizing speech from silent lip movement, which is called lip reading, is
a challenging task due to 1) the inherent information insufficiency of lip
movement to fully represent the speech, and 2) the existence of homophenes that
have similar lip movement with different pronunciations. In this paper, we try
to alleviate the aforementioned two challenges in lip reading by proposing a
Multi-head Visual-audio Memory (MVM). Firstly, MVM is trained with audio-visual
datasets and remembers audio representations by modelling the
inter-relationships …

arxiv audio cv head lip reading memory reading

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Applied Scientist, Control Stack, AWS Center for Quantum Computing

@ Amazon.com | Pasadena, California, USA

Specialist Marketing with focus on ADAS/AD f/m/d

@ AVL | Graz, AT

Machine Learning Engineer, PhD Intern

@ Instacart | United States - Remote

Supervisor, Breast Imaging, Prostate Center, Ultrasound

@ University Health Network | Toronto, ON, Canada

Senior Manager of Data Science (Recommendation Science)

@ NBCUniversal | New York, NEW YORK, United States