Web: http://arxiv.org/abs/2201.09457

Jan. 26, 2022, 2:11 a.m. | Yan Li, Tuo Zhao, Guanghui Lan

cs.LG updates on arXiv.org arxiv.org

We propose the homotopic policy mirror descent (HPMD) method for solving
discounted, infinite horizon MDPs with finite state and action space, and study
its policy convergence. We report three properties that seem to be new in the
literature of policy gradient methods: (1) The policy first converges linearly,
then superlinearly with order $\gamma^{-2}$ to the set of optimal policies,
after $\mathcal{O}(\log(1/\Delta^*))$ number of iterations, where $\Delta^*$ is
defined via a gap quantity associated with the optimal state-action value
function; (2) …

arxiv complexity policy

More from arxiv.org / cs.LG updates on arXiv.org

Data Analytics and Technical support Lead

@ Coupa Software, Inc. | Bogota, Colombia

Data Science Manager

@ Vectra | San Jose, CA

Data Analyst Sr

@ Capco | Brazil - Sao Paulo

Data Scientist (NLP)

@ Builder.ai | London, England, United Kingdom - Remote

Senior Data Analyst

@ BuildZoom | Scottsdale, AZ/ San Francisco, CA/ Remote

Senior Research Scientist, Speech Recognition

@ SoundHound Inc. | Toronto, Canada