all AI news
Disentangling Length from Quality in Direct Preference Optimization
March 29, 2024, 4:42 a.m. | Ryan Park, Rafael Rafailov, Stefano Ermon, Chelsea Finn
cs.LG updates on arXiv.org arxiv.org
Abstract: Reinforcement Learning from Human Feedback (RLHF) has been a crucial component in the recent success of Large Language Models. However, RLHF is know to exploit biases in human preferences, such as verbosity. A well-formatted and eloquent answer is often more highly rated by users, even when it is less helpful and objective. A number of approaches have been developed to control those biases in the classical RLHF literature, but the problem remains relatively under-explored for …
abstract arxiv biases cs.cl cs.lg direct preference optimization exploit feedback however human human feedback language language models large language large language models optimization quality reinforcement reinforcement learning rlhf success type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
AI Engineering Manager
@ M47 Labs | Barcelona, Catalunya [Cataluña], Spain