Web: http://arxiv.org/abs/2206.03383

June 16, 2022, 1:11 a.m. | Hao Hu, Yiqin Yang, Qianchuan Zhao, Chongjie Zhang

cs.LG updates on arXiv.org arxiv.org

Offline reinforcement learning (RL) enables effective learning from
previously collected data without exploration, which shows great promise in
real-world applications when exploration is expensive or even infeasible. The
discount factor, $\gamma$, plays a vital role in improving online RL sample
efficiency and estimation accuracy, but the role of the discount factor in
offline RL is not well explored. This paper examines two distinct effects of
$\gamma$ in offline RL with theoretical analysis, namely the regularization
effect and the pessimism effect. …

