### Web: http://arxiv.org/abs/2111.03941

Jan. 28, 2022, 2:11 a.m. | Seohong Park, Jaekyeom Kim, Gunhee Kim

In reinforcement learning, continuous time is often discretized by a time
scale $\delta$, to which the resulting performance is known to be highly
sensitive. In this work, we seek to find a $\delta$-invariant algorithm for
policy gradient (PG) methods, which performs well regardless of the value of
$\delta$. We first identify the underlying reasons that cause PG methods to
fail as $\delta \to 0$, proving that the variance of the PG estimator can
diverge to infinity in stochastic environments under …

