Sept. 16, 2022, 1:12 a.m. | Hao Sun, Lei Han, Rui Yang, Xiaoteng Ma, Jian Guo, Bolei Zhou

cs.LG updates on arXiv.org arxiv.org

In this work, we study the simple yet universally applicable case of reward
shaping in value-based Deep Reinforcement Learning (DRL). We show that reward
shifting in the form of the linear transformation is equivalent to changing the
initialization of the $Q$-function in function approximation. Based on such an
equivalence, we bring the key insight that a positive reward shifting leads to
conservative exploitation, while a negative reward shifting leads to
curiosity-driven exploration. Accordingly, conservative exploitation improves
offline RL value estimation, …

arxiv deep rl value

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US