all AI news
On the Convergence Rates of Policy Gradient Methods. (arXiv:2201.07443v1 [math.OC])
Jan. 20, 2022, 2:10 a.m. | Lin Xiao
cs.LG updates on arXiv.org arxiv.org
We consider infinite-horizon discounted Markov decision problems with finite
state and action spaces. We show that with direct parametrization in the policy
space, the weighted value function, although non-convex in general, is both
quasi-convex and quasi-concave. While quasi-convexity helps explain the
convergence of policy gradient methods to global optima, quasi-concavity hints
at their convergence guarantees using arbitrarily large step sizes that are not
dictated by the Lipschitz constant charactering smoothness of the value
function. In particular, we show that when …
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Data Management Assistant
@ World Vision | Amman Office, Jordan
Cloud Data Engineer, Global Services Delivery, Google Cloud
@ Google | Buenos Aires, Argentina