March 15, 2024, 4:42 a.m. | Abdul Hameed Azeemi, Ihsan Ayyub Qazi, Agha Ali Raza

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.09259v1 Announce Type: cross
Abstract: Active learning (AL) techniques reduce labeling costs for training neural machine translation (NMT) models by selecting smaller representative subsets from unlabeled data for annotation. Diversity sampling techniques select heterogeneous instances, while uncertainty sampling methods select instances with the highest model uncertainty. Both approaches have limitations - diversity methods may extract varied but trivial examples, while uncertainty sampling can yield repetitive, uninformative instances. To bridge this gap, we propose HUDS, a hybrid AL strategy for domain …

abstract active learning annotation arxiv costs cs.cl cs.lg data diversity hybrid instances labeling machine machine translation neural machine translation reduce sampling subsets training translation type uncertainty

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US