April 15, 2024, 4:47 a.m. | Guangyu Yang, Jinghong Chen, Weizhe Lin, Bill Byrne

cs.CL updates on arXiv.org arxiv.org

arXiv:2311.08380v2 Announce Type: replace
Abstract: Minimum Bayes Risk (MBR) decoding can significantly improve translation performance of Multilingual Large Language Models (MLLMs). However, MBR decoding is computationally expensive. We show how the recently developed Reinforcement Learning technique, Direct Preference Optimization (DPO), can fine-tune MLLMs to get the gains of MBR without any additional computation in inference. Our method uses only a small monolingual fine-tuning set and yields significantly improved performance on multiple NMT test sets compared to MLLMs without DPO.

abstract arxiv bayes cs.cl decoding direct preference optimization however language language models large language large language models machine machine translation mllms multilingual neural machine translation optimization performance reinforcement reinforcement learning risk show translation type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

#13721 - Data Engineer - AI Model Testing

@ Qualitest | Miami, Florida, United States

Elasticsearch Administrator

@ ManTech | 201BF - Customer Site, Chantilly, VA