April 16, 2024, 4:44 a.m. | Francis McCann Ramirez, Luka Chkhetiani, Andrew Ehrenberg, Robert McHardy, Rami Botros, Yash Khare, Andrea Vanzo, Taufiquzzaman Peyash, Gabriel Oexle,

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.09841v1 Announce Type: cross
Abstract: This paper describes AssemblyAI's industrial-scale automatic speech recognition (ASR) system, designed to meet the requirements of large-scale, multilingual ASR serving various application needs. Our system leverages a diverse training dataset comprising unsupervised (12.5M hours), supervised (188k hours), and pseudo-labeled (1.6M hours) data across four languages. We provide a detailed description of our model architecture, consisting of a full-context 600M-parameter Conformer encoder pre-trained with BEST-RQ and an RNN-T decoder fine-tuned jointly with the encoder. Our extensive …

abstract application arxiv asr assemblyai automatic speech recognition cs.cl cs.lg cs.sd data dataset diverse eess.as industrial languages multilingual paper recognition requirements scale speech speech recognition training type unsupervised

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

MLOps Engineer - Hybrid Intelligence

@ Capgemini | Madrid, M, ES

Analista de Business Intelligence (Industry Insights)

@ NielsenIQ | Cotia, Brazil