Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading. (arXiv:2204.01725v1 [cs.CV]) | allainews.com

April 6, 2022, 1:10 a.m. | Minsu Kim, Jeong Hun Yeo, Yong Man Ro

cs.CV updates on arXiv.org arxiv.org

Recognizing speech from silent lip movement, which is called lip reading, is
a challenging task due to 1) the inherent information insufficiency of lip
movement to fully represent the speech, and 2) the existence of homophenes that
have similar lip movement with different pronunciations. In this paper, we try
to alleviate the aforementioned two challenges in lip reading by proposing a
Multi-head Visual-audio Memory (MVM). Firstly, MVM is trained with audio-visual
datasets and remembers audio representations by modelling the
inter-relationships …

arxiv audio cv head lip reading memory reading

More from arxiv.org / cs.CV updates on arXiv.org

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception 18 hours ago | arxiv.org

agent arxiv autonomous cs.cl +8

Low-resolution Prior Equilibrium Network for CT Reconstruction 18 hours ago | arxiv.org

abstract arxiv cs.cv deep learning +17

MARformer: An Efficient Metal Artifact Reduction Transformer for Dental CBCT Images 18 hours ago | arxiv.org

abstract artifact arxiv cs.cv +16

Back to Basics: Fast Denoising Iterative Algorithm 18 hours ago | arxiv.org

abstract algorithm arxiv basics +10

Predicting Thrombectomy Recanalization from CT Imaging Using Deep Learning Models 18 hours ago | arxiv.org

abstract arxiv benefit clinicians +10

Efficiently Adversarial Examples Generation for Visual-Language Models under Targeted Transfer Scenarios using Diffusion Models 18 hours ago | arxiv.org

abstract adversarial adversarial examples art +20

Methods and strategies for improving the novel view synthesis quality of neural radiation field 18 hours ago | arxiv.org

abstract application arxiv attention +16

AffordanceLLM: Grounding Affordance from Vision Language Models 18 hours ago | arxiv.org

arxiv cs.cv cs.ro language +3

DualFluidNet: an Attention-based Dual-pipeline Network for FLuid Simulation 18 hours ago | arxiv.org

arxiv attention cs.cv cs.gr +4

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Applied Scientist, Control Stack, AWS Center for Quantum Computing

@ Amazon.com | Pasadena, California, USA

View on ai-jobs.net

Specialist Marketing with focus on ADAS/AD f/m/d

@ AVL | Graz, AT

View on ai-jobs.net

Machine Learning Engineer, PhD Intern

@ Instacart | United States - Remote

View on ai-jobs.net

Supervisor, Breast Imaging, Prostate Center, Ultrasound

@ University Health Network | Toronto, ON, Canada

View on ai-jobs.net

Senior Manager of Data Science (Recommendation Science)

@ NBCUniversal | New York, NEW YORK, United States

View on ai-jobs.net