all AI news
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Feb. 13, 2024, 5:42 a.m. | Lichang Chen Chen Zhu Davit Soselia Jiuhai Chen Tianyi Zhou Tom Goldstein Heng Huang Mohammad
cs.LG updates on arXiv.org arxiv.org
challenge cs.ai cs.cl cs.lg feedback hacking human human feedback issue llms reinforcement reinforcement learning rlhf study work
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Data Scientist (Database Development)
@ Nasdaq | Bengaluru-Affluence