all AI news
Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods. (arXiv:2111.03941v6 [cs.LG] UPDATED)
Web: http://arxiv.org/abs/2111.03941
Jan. 28, 2022, 2:11 a.m. | Seohong Park, Jaekyeom Kim, Gunhee Kim
cs.LG updates on arXiv.org arxiv.org
In reinforcement learning, continuous time is often discretized by a time
scale $\delta$, to which the resulting performance is known to be highly
sensitive. In this work, we seek to find a $\delta$-invariant algorithm for
policy gradient (PG) methods, which performs well regardless of the value of
$\delta$. We first identify the underlying reasons that cause PG methods to
fail as $\delta \to 0$, proving that the variance of the PG estimator can
diverge to infinity in stochastic environments under …
More from arxiv.org / cs.LG updates on arXiv.org
Latest AI/ML/Big Data Jobs
Data Analyst, Credit Risk
@ Stripe | US Remote
Senior Data Engineer
@ Snyk | Cluj, Romania, or Remote
Senior Software Engineer (C++), Autonomy Visualization
@ Nuro, Inc. | Mountain View, California (HQ)
Machine Learning Intern (January 2023)
@ Cohere | Toronto, Palo Alto, San Francisco, London
Senior Machine Learning Engineer, Reinforcement Learning, Personalization
@ Spotify | New York, NY
AWS Data Engineer
@ ProCogia | Seattle