all AI news
Exploiting Reward Shifting in Value-Based Deep RL. (arXiv:2209.07288v1 [cs.LG])
Sept. 16, 2022, 1:12 a.m. | Hao Sun, Lei Han, Rui Yang, Xiaoteng Ma, Jian Guo, Bolei Zhou
cs.LG updates on arXiv.org arxiv.org
In this work, we study the simple yet universally applicable case of reward
shaping in value-based Deep Reinforcement Learning (DRL). We show that reward
shifting in the form of the linear transformation is equivalent to changing the
initialization of the $Q$-function in function approximation. Based on such an
equivalence, we bring the key insight that a positive reward shifting leads to
conservative exploitation, while a negative reward shifting leads to
curiosity-driven exploration. Accordingly, conservative exploitation improves
offline RL value estimation, …
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Data Scientist
@ Publicis Groupe | New York City, United States
Bigdata Cloud Developer - Spark - Assistant Manager
@ State Street | Hyderabad, India