Jan. 20, 2022, 2:10 a.m. | Lin Xiao

cs.LG updates on arXiv.org arxiv.org

We consider infinite-horizon discounted Markov decision problems with finite
state and action spaces. We show that with direct parametrization in the policy
space, the weighted value function, although non-convex in general, is both
quasi-convex and quasi-concave. While quasi-convexity helps explain the
convergence of policy gradient methods to global optima, quasi-concavity hints
at their convergence guarantees using arbitrarily large step sizes that are not
dictated by the Lipschitz constant charactering smoothness of the value
function. In particular, we show that when …

arxiv gradient math policy

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Management Assistant

@ World Vision | Amman Office, Jordan

Cloud Data Engineer, Global Services Delivery, Google Cloud

@ Google | Buenos Aires, Argentina