March 29, 2024, 4:42 a.m. | Ryan Park, Rafael Rafailov, Stefano Ermon, Chelsea Finn

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.19159v1 Announce Type: cross
Abstract: Reinforcement Learning from Human Feedback (RLHF) has been a crucial component in the recent success of Large Language Models. However, RLHF is know to exploit biases in human preferences, such as verbosity. A well-formatted and eloquent answer is often more highly rated by users, even when it is less helpful and objective. A number of approaches have been developed to control those biases in the classical RLHF literature, but the problem remains relatively under-explored for …

abstract arxiv biases cs.cl cs.lg direct preference optimization exploit feedback however human human feedback language language models large language large language models optimization quality reinforcement reinforcement learning rlhf success type

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

AI Engineering Manager

@ M47 Labs | Barcelona, Catalunya [Cataluña], Spain