Feb. 20, 2024, 5:43 a.m. | Nuo Xu, Jun Zhao, Can Zu, Tao Gui, Qi Zhang, Xuanjing Huang

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.11525v1 Announce Type: cross
Abstract: Faithfulness, expressiveness, and elegance is the constant pursuit in machine translation. However, traditional metrics like \textit{BLEU} do not strictly align with human preference of translation quality. In this paper, we explore leveraging reinforcement learning with human feedback (\textit{RLHF}) to improve translation quality. It is non-trivial to collect a large high-quality dataset of human comparisons between translations, especially for low-resource languages. To address this issue, we propose a cost-effective preference learning strategy, optimizing reward models by …

abstract arxiv bleu cost cs.cl cs.lg explore feedback human human feedback machine machine translation metrics modeling paper quality reinforcement reinforcement learning rlhf solution translation type

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

.NET Software Engineer (AI Focus)

@ Boskalis | Papendrecht, Netherlands