Sept. 16, 2022, 1:12 a.m. | Hao Sun, Lei Han, Rui Yang, Xiaoteng Ma, Jian Guo, Bolei Zhou

cs.LG updates on arXiv.org arxiv.org

In this work, we study the simple yet universally applicable case of reward
shaping in value-based Deep Reinforcement Learning (DRL). We show that reward
shifting in the form of the linear transformation is equivalent to changing the
initialization of the $Q$-function in function approximation. Based on such an
equivalence, we bring the key insight that a positive reward shifting leads to
conservative exploitation, while a negative reward shifting leads to
curiosity-driven exploration. Accordingly, conservative exploitation improves
offline RL value estimation, …

arxiv deep rl value

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Scientist

@ Publicis Groupe | New York City, United States

Bigdata Cloud Developer - Spark - Assistant Manager

@ State Street | Hyderabad, India