Feb. 21, 2024, 5:46 a.m. | David Gimeno-G\'omez, Carlos-D. Mart\'inez-Hinarejos

cs.CV updates on arXiv.org arxiv.org

arXiv:2402.13004v1 Announce Type: new
Abstract: Thanks to the rise of deep learning and the availability of large-scale audio-visual databases, recent advances have been achieved in Visual Speech Recognition (VSR). Similar to other speech processing tasks, these end-to-end VSR systems are usually based on encoder-decoder architectures. While encoders are somewhat general, multiple decoding approaches have been explored, such as the conventional hybrid model based on Deep Neural Networks combined with Hidden Markov Models (DNN-HMM) or the Connectionist Temporal Classification (CTC) paradigm. …

abstract advances architectures arxiv attention audio availability comparison continuous cs.cv databases decoder deep learning encoder encoder-decoder hybrid processing recognition scale speech speech processing speech recognition systems tasks type visual

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US