March 11, 2024, 4:41 a.m. | Huiying Zhong, Zhun Deng, Weijie J. Su, Zhiwei Steven Wu, Linjun Zhang

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.05006v1 Announce Type: new
Abstract: Reinforcement learning with human feedback (RLHF) is an emerging paradigm to align models with human preferences. Typically, RLHF aggregates preferences from multiple individuals who have diverse viewpoints that may conflict with each other. Our work \textit{initiates} the theoretical study of multi-party RLHF that explicitly models the diverse preferences of multiple individuals. We show how traditional RLHF approaches can fail since learning a single reward function cannot capture and balance the preferences of multiple individuals. To …

abstract arxiv conflict cs.ai cs.lg diverse feedback human human feedback multiple paradigm reinforcement reinforcement learning rlhf stat.me stat.ml study type work

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Associate Data Engineer

@ Nominet | Oxford/ Hybrid, GB

Data Science Senior Associate

@ JPMorgan Chase & Co. | Bengaluru, Karnataka, India