Web: http://arxiv.org/abs/2106.14080

Jan. 31, 2022, 2:11 a.m. | Nirbhay Modhe, Harish Kamath, Dhruv Batra, Ashwin Kalyan

cs.LG updates on arXiv.org arxiv.org

This work shows that value-aware model learning, known for its numerous
theoretical benefits, is also practically viable for solving challenging
continuous control tasks in prevalent model-based reinforcement learning
algorithms. First, we derive a novel value-aware model learning objective by
bounding the model-advantage i.e. model performance difference, between two
MDPs or models given a fixed policy, achieving superior performance to prior
value-aware objectives in most continuous control environments. Second, we
identify the issue of stale value estimates in naively substituting value-aware …

arxiv learning model models reinforcement learning theory value

