June 6, 2024, 4:43 a.m. | Ognjen Kundacina, Vladimir Vincan, Dragisa Miskovic

cs.LG updates on arXiv.org arxiv.org

arXiv:2406.02566v1 Announce Type: cross
Abstract: Emphasizing a data-centric AI approach, this paper introduces a novel two-stage active learning (AL) pipeline for automatic speech recognition (ASR), combining unsupervised and supervised AL methods. The first stage utilizes unsupervised AL by using x-vectors clustering for diverse sample selection from unlabeled speech data, thus establishing a robust initial dataset for the subsequent supervised AL. The second stage incorporates a supervised AL strategy, with a batch AL method specifically developed for ASR, aimed at selecting …

abstract active learning arxiv asr automatic speech recognition bayesian clustering cs.ai cs.cl cs.lg data data-centric diverse eess.as novel paper pipeline recognition sample speech speech recognition stage type unsupervised vectors

Senior Data Engineer

@ Displate | Warsaw

Decision Scientist

@ Tesco Bengaluru | Bengaluru, India

Senior Technical Marketing Engineer (AI/ML-powered Cloud Security)

@ Palo Alto Networks | Santa Clara, CA, United States

Associate Director, Technology & Data Lead - Remote

@ Novartis | East Hanover

Product Manager, Generative AI

@ Adobe | San Jose

Associate Director – Data Architect Corporate Functions

@ Novartis | Prague