Web: http://arxiv.org/abs/2201.10081

Jan. 26, 2022, 2:11 a.m. | Blake Wulfe, Ashwin Balakrishna, Logan Ellis, Jean Mercat, Rowan McAllister, Adrien Gaidon

cs.LG updates on arXiv.org arxiv.org

The ability to learn reward functions plays an important role in enabling the
deployment of intelligent agents in the real world. However, comparing reward
functions, for example as a means of evaluating reward learning methods,
presents a challenge. Reward functions are typically compared by considering
the behavior of optimized policies, but this approach conflates deficiencies in
the reward function with those of the policy search algorithm used to optimize
it. To address this challenge, Gleave et al. (2020) propose the …

