Feb. 28, 2024, 5:49 a.m. | Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno

cs.CL updates on arXiv.org arxiv.org

arXiv:2402.17184v1 Announce Type: new
Abstract: The accuracy of end-to-end (E2E) automatic speech recognition (ASR) models continues to improve as they are scaled to larger sizes, with some now reaching billions of parameters. Widespread deployment and adoption of these models, however, requires computationally efficient strategies for decoding. In the present work, we study one such strategy: applying multiple frame reduction layers in the encoder to compress encoder outputs into a small number of output frames. While similar techniques have been investigated …

abstract accuracy adoption arxiv asr automatic speech recognition computational cs.cl cs.sd decoding deployment e2e eess.as encoder parameters rate recognition speech speech recognition strategies type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Engineer

@ Quantexa | Sydney, New South Wales, Australia

Staff Analytics Engineer

@ Warner Bros. Discovery | NY New York 230 Park Avenue South