June 6, 2024, 4:42 a.m. | Ilgee Hong, Zichong Li, Alexander Bukharin, Yixiao Li, Haoming Jiang, Tianbao Yang, Tuo Zhao

cs.LG updates on arXiv.org arxiv.org

arXiv:2406.02764v1 Announce Type: new
Abstract: Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values by learning rewards from human preference data. Due to various reasons, however, such data typically takes the form of rankings over pairs of trajectory segments, which fails to capture the varying strengths of preferences across different pairs. In this paper, we propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO), designed to address this uncertainty …

abstract ai systems arxiv cs.ai cs.lg data feedback form however human human feedback rankings reinforcement reinforcement learning rlhf scaling systems trajectory type values

Senior Data Engineer

@ Displate | Warsaw

Principal Architect

@ eSimplicity | Silver Spring, MD, US

Embedded Software Engineer

@ Carrier | CAN03: Carrier-Charlotte, NC 9701 Old Statesville Road, Charlotte, NC, 28269 USA

(USA) Software Engineer III

@ Roswell Park Comprehensive Cancer Center | (USA) CA SUNNYVALE Home Office SUNNYVALE III - 840 W CALIFORNIA

Experienced Manufacturing and Automation Engineer

@ Boeing | DEU - Munich, Germany

Software Engineering-Sr Engineer (Java 17, Python, Microservices, Spring Boot, REST)

@ FICO | Bengaluru, India