all AI news
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences
Feb. 15, 2024, 5:42 a.m. | Souradip Chakraborty, Jiahao Qiu, Hui Yuan, Alec Koppel, Furong Huang, Dinesh Manocha, Amrit Singh Bedi, Mengdi Wang
cs.LG updates on arXiv.org arxiv.org
Abstract: Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data. However, such an approach overlooks the rich diversity of human preferences inherent in data collected from multiple users. In this work, we first derive an impossibility result of alignment with single reward RLHF, thereby highlighting its insufficiency in representing diverse human preferences. To provide an equitable solution to the problem, we learn a …
abstract alignment arxiv cs.ai cs.cl cs.lg cs.ro data diverse diversity feedback human human feedback language language models large language large language models multiple reinforcement reinforcement learning reward model rlhf singular type work
More from arxiv.org / cs.LG updates on arXiv.org
Efficient Data-Driven MPC for Demand Response of Commercial Buildings
2 days, 18 hours ago |
arxiv.org
Testing the Segment Anything Model on radiology data
2 days, 18 hours ago |
arxiv.org
Calorimeter shower superresolution
2 days, 18 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US