Web: http://arxiv.org/abs/2201.10081

Jan. 26, 2022, 2:11 a.m. | Blake Wulfe, Ashwin Balakrishna, Logan Ellis, Jean Mercat, Rowan McAllister, Adrien Gaidon

cs.LG updates on arXiv.org arxiv.org

The ability to learn reward functions plays an important role in enabling the
deployment of intelligent agents in the real world. However, comparing reward
functions, for example as a means of evaluating reward learning methods,
presents a challenge. Reward functions are typically compared by considering
the behavior of optimized policies, but this approach conflates deficiencies in
the reward function with those of the policy search algorithm used to optimize
it. To address this challenge, Gleave et al. (2020) propose the …

arxiv comparison

More from arxiv.org / cs.LG updates on arXiv.org

Data Engineer, Buy with Prime

@ Amazon.com | Santa Monica, California, USA

Data Architect – Public Sector Health Data Architect, WWPS

@ Amazon.com | US, VA, Virtual Location - Virginia

[Job 8224] Data Engineer - Developer Senior

@ CI&T | Brazil

Software Engineer, Machine Learning, Planner/Behavior Prediction

@ Nuro, Inc. | Mountain View, California (HQ)

Lead Data Scientist

@ Inspectorio | Ho Chi Minh City, Ho Chi Minh City, Vietnam - Remote

Data Engineer

@ Craftable | Portugal - Remote