Web: http://arxiv.org/abs/2106.14080

Jan. 31, 2022, 2:11 a.m. | Nirbhay Modhe, Harish Kamath, Dhruv Batra, Ashwin Kalyan

cs.LG updates on arXiv.org arxiv.org

This work shows that value-aware model learning, known for its numerous
theoretical benefits, is also practically viable for solving challenging
continuous control tasks in prevalent model-based reinforcement learning
algorithms. First, we derive a novel value-aware model learning objective by
bounding the model-advantage i.e. model performance difference, between two
MDPs or models given a fixed policy, achieving superior performance to prior
value-aware objectives in most continuous control environments. Second, we
identify the issue of stale value estimates in naively substituting value-aware …

arxiv learning model models reinforcement learning theory value

More from arxiv.org / cs.LG updates on arXiv.org

Senior Data Engineer

@ DAZN | Hammersmith, London, United Kingdom

Sr. Data Engineer, Growth

@ Netflix | Remote, United States

Data Engineer - Remote

@ Craft | Wrocław, Lower Silesian Voivodeship, Poland

Manager, Operations Data Science

@ Binance.US | Vancouver

Senior Machine Learning Researcher for Copilot

@ GitHub | Remote - Europe

Sr. Marketing Data Analyst

@ HoneyBook | San Francisco, CA