all AI news
RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences
Feb. 28, 2024, 5:42 a.m. | Jie Cheng, Gang Xiong, Xingyuan Dai, Qinghai Miao, Yisheng Lv, Fei-Yue Wang
cs.LG updates on arXiv.org arxiv.org
Abstract: Preference-based Reinforcement Learning (PbRL) avoids the need for reward engineering by harnessing human preferences as the reward signal. However, current PbRL algorithms over-reliance on high-quality feedback from domain experts, which results in a lack of robustness. In this paper, we present RIME, a robust PbRL algorithm for effective reward learning from noisy preferences. Our method incorporates a sample selection-based discriminator to dynamically filter denoised preferences for robust training. To mitigate the accumulated error caused by …
abstract algorithm algorithms arxiv cs.ai cs.lg current domain domain experts engineering experts feedback human paper quality reinforcement reinforcement learning reliance results robust robustness signal type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Research Scientist
@ Meta | Menlo Park, CA
Principal Data Scientist
@ Mastercard | O'Fallon, Missouri (Main Campus)