Feb. 13, 2024, 5:42 a.m. | Grant C. Forbes Nitish Gupta Leonardo Villalobos-Arias Colin M. Potts Arnav Jhala David L. Roberts

cs.LG updates on arXiv.org arxiv.org

Recently there has been a proliferation of intrinsic motivation (IM) reward-shaping methods to learn in complex and sparse-reward environments. These methods can often inadvertently change the set of optimal policies in an environment, leading to suboptimal behavior. Previous work on mitigating the risks of reward shaping, particularly through potential-based reward shaping (PBRS), has not been applicable to many IM methods, as they are often complex, trainable functions themselves, and therefore dependent on a wider set of variables than the traditional …

behavior change cs.lg environment environments intrinsic learn motivation risks set through work

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

Lead Software Engineer, Machine Learning

@ Monarch Money | Remote (US)

Investigator, Data Science

@ GSK | Stevenage

Alternance - Assistant.e Chef de Projet Data Business Intelligence (H/F)

@ Pernod Ricard | FR - Paris - The Island

Expert produit Big Data & Data Science - Services Publics - Nantes

@ Sopra Steria | Nantes, France