all AI news
Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition
Feb. 21, 2024, 5:46 a.m. | David Gimeno-G\'omez, Carlos-D. Mart\'inez-Hinarejos
cs.CV updates on arXiv.org arxiv.org
Abstract: Thanks to the rise of deep learning and the availability of large-scale audio-visual databases, recent advances have been achieved in Visual Speech Recognition (VSR). Similar to other speech processing tasks, these end-to-end VSR systems are usually based on encoder-decoder architectures. While encoders are somewhat general, multiple decoding approaches have been explored, such as the conventional hybrid model based on Deep Neural Networks combined with Hidden Markov Models (DNN-HMM) or the Connectionist Temporal Classification (CTC) paradigm. …
abstract advances architectures arxiv attention audio availability comparison continuous cs.cv databases decoder deep learning encoder encoder-decoder hybrid processing recognition scale speech speech processing speech recognition systems tasks type visual
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US