March 15, 2024, 4:42 a.m. | Abdul Hameed Azeemi, Ihsan Ayyub Qazi, Agha Ali Raza

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.09259v1 Announce Type: cross
Abstract: Active learning (AL) techniques reduce labeling costs for training neural machine translation (NMT) models by selecting smaller representative subsets from unlabeled data for annotation. Diversity sampling techniques select heterogeneous instances, while uncertainty sampling methods select instances with the highest model uncertainty. Both approaches have limitations - diversity methods may extract varied but trivial examples, while uncertainty sampling can yield repetitive, uninformative instances. To bridge this gap, we propose HUDS, a hybrid AL strategy for domain …

abstract active learning annotation arxiv costs cs.cl cs.lg data diversity hybrid instances labeling machine machine translation neural machine translation reduce sampling subsets training translation type uncertainty

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Machine Learning Engineer

@ Samsara | Canada - Remote