Web: http://arxiv.org/abs/2111.03941

Jan. 28, 2022, 2:11 a.m. | Seohong Park, Jaekyeom Kim, Gunhee Kim

cs.LG updates on arXiv.org arxiv.org

In reinforcement learning, continuous time is often discretized by a time
scale $\delta$, to which the resulting performance is known to be highly
sensitive. In this work, we seek to find a $\delta$-invariant algorithm for
policy gradient (PG) methods, which performs well regardless of the value of
$\delta$. We first identify the underlying reasons that cause PG methods to
fail as $\delta \to 0$, proving that the variance of the PG estimator can
diverge to infinity in stochastic environments under …

arxiv gradient policy time

More from arxiv.org / cs.LG updates on arXiv.org

Data Analyst, Credit Risk

@ Stripe | US Remote

Senior Data Engineer

@ Snyk | Cluj, Romania, or Remote

Senior Software Engineer (C++), Autonomy Visualization

@ Nuro, Inc. | Mountain View, California (HQ)

Machine Learning Intern (January 2023)

@ Cohere | Toronto, Palo Alto, San Francisco, London

Senior Machine Learning Engineer, Reinforcement Learning, Personalization

@ Spotify | New York, NY

AWS Data Engineer

@ ProCogia | Seattle