all AI news
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks
April 9, 2024, 4:51 a.m. | Sizhou Chen, Songyang Gao, Sen Fang
cs.CL updates on arXiv.org arxiv.org
Abstract: The Transformer architecture has proven to be highly effective for Automatic Speech Recognition (ASR) tasks, becoming a foundational component for a plethora of research in the domain. Historically, many approaches have leaned on fixed-length attention windows, which becomes problematic for varied speech samples in duration and complexity, leading to data over-smoothing and neglect of essential long-term connectivity. Addressing this limitation, we introduce Echo-MSA, a nimble module equipped with a variable-length attention mechanism that accommodates a …
abstract architecture arxiv asr attention automatic speech recognition cs.cl cs.sd domain eess.as foundational modular nature recognition research speech speech recognition tasks transformer transformer architecture type windows
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Data Analyst (Digital Business Analyst)
@ Activate Interactive Pte Ltd | Singapore, Central Singapore, Singapore