all AI news
Parameter-free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients. (arXiv:2109.11788v3 [cs.LG] UPDATED)
cs.LG updates on arXiv.org arxiv.org
Approximation of the value functions in value-based deep reinforcement
learning induces overestimation bias, resulting in suboptimal policies. We show
that when the reinforcement signals received by the agents have a high
variance, deep actor-critic approaches that overcome the overestimation bias
lead to a substantial underestimation bias. We first address the detrimental
issues in the existing approaches that aim to overcome such underestimation
error. Then, through extensive statistical analysis, we introduce a novel,
parameter-free Deep Q-learning variant to reduce this underestimation …
arxiv bias free learning policy reinforcement reinforcement learning