all AI news
[D] How to detect or speculate the "reward-hacking" phenomenon?
March 3, 2024, 1:57 p.m. | /u/zetiansss
Machine Learning www.reddit.com
In the paper, the authors use the KL-reward curve to detect reward-hacking phenomenon, saying that the reward starts to decrease and thus reward hacking happens. However, previous papers like [https://arxiv.org/pdf/2312.09244.pdf](https://arxiv.org/pdf/2312.09244.pdf) or [https://arxiv.org/pdf/2312.09244.pdf](https://arxiv.org/pdf/2312.09244.pdf) often use two reward models to detect reward hacking: the proxy reward and the true reward. The policy model is updated under the proxy reward, so …
benefits deepmind hacking machinelearning paper reading shows think warm
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Data Scientist (Database Development)
@ Nasdaq | Bengaluru-Affluence