all AI news
To Label or Not to Label: Hybrid Active Learning for Neural Machine Translation
March 15, 2024, 4:42 a.m. | Abdul Hameed Azeemi, Ihsan Ayyub Qazi, Agha Ali Raza
cs.LG updates on arXiv.org arxiv.org
Abstract: Active learning (AL) techniques reduce labeling costs for training neural machine translation (NMT) models by selecting smaller representative subsets from unlabeled data for annotation. Diversity sampling techniques select heterogeneous instances, while uncertainty sampling methods select instances with the highest model uncertainty. Both approaches have limitations - diversity methods may extract varied but trivial examples, while uncertainty sampling can yield repetitive, uninformative instances. To bridge this gap, we propose HUDS, a hybrid AL strategy for domain …
abstract active learning annotation arxiv costs cs.cl cs.lg data diversity hybrid instances labeling machine machine translation neural machine translation reduce sampling subsets training translation type uncertainty
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Machine Learning Engineer
@ Samsara | Canada - Remote