all AI news
Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution
Feb. 20, 2024, 5:43 a.m. | Nuo Xu, Jun Zhao, Can Zu, Tao Gui, Qi Zhang, Xuanjing Huang
cs.LG updates on arXiv.org arxiv.org
Abstract: Faithfulness, expressiveness, and elegance is the constant pursuit in machine translation. However, traditional metrics like \textit{BLEU} do not strictly align with human preference of translation quality. In this paper, we explore leveraging reinforcement learning with human feedback (\textit{RLHF}) to improve translation quality. It is non-trivial to collect a large high-quality dataset of human comparisons between translations, especially for low-resource languages. To address this issue, we propose a cost-effective preference learning strategy, optimizing reward models by …
abstract arxiv bleu cost cs.cl cs.lg explore feedback human human feedback machine machine translation metrics modeling paper quality reinforcement reinforcement learning rlhf solution translation type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
.NET Software Engineer (AI Focus)
@ Boskalis | Papendrecht, Netherlands