April 22, 2024, 4:46 a.m. | Darshan Prabhu, Sai Ganesh Mirishkar, Pankaj Wasnik

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.12628v1 Announce Type: new
Abstract: Self-supervised learned (SSL) models such as Wav2vec and HuBERT yield state-of-the-art results on speech-related tasks. Given the effectiveness of such models, it is advantageous to use them in conventional ASR systems. While some approaches suggest incorporating these models as a trainable encoder or a learnable frontend, training such systems is extremely slow and requires a lot of computation cycles. In this work, we propose two simple approaches that use (1) framewise addition and (2) cross-attention …

abstract art arxiv asr automatic speech recognition cs.cl encoder frontend recognition results speech speech recognition ssl state systems tasks them type

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

DevOps Engineer (Data Team)

@ Reward Gateway | Sofia/Plovdiv