March 14, 2024, 4:42 a.m. | Wenjing Zhu, Sining Sun, Changhao Shan, Peng Fan, Qing Yang

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.08258v1 Announce Type: cross
Abstract: Conformer-based attention models have become the de facto backbone model for Automatic Speech Recognition tasks. A blank symbol is usually introduced to align the input and output sequences for CTC or RNN-T models. Unfortunately, the long input length overloads computational budget and memory consumption quadratically by attention mechanism. In this work, we propose a "Skip-and-Recover" Conformer architecture, named Skipformer, to squeeze sequence input length dynamically and inhomogeneously. Skipformer uses an intermediate CTC output as criteria …

abstract arxiv attention automatic speech recognition become budget computational consumption cs.cl cs.lg memory memory consumption recognition rnn speech speech recognition strategy tasks type

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Data Engineer

@ Kaseya | Bengaluru, Karnataka, India