all AI news
On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes
March 12, 2024, 4:42 a.m. | Navdeep Kumar, Yashaswini Murthy, Itai Shufaro, Kfir Y. Levy, R. Srikant, Shie Mannor
cs.LG updates on arXiv.org arxiv.org
Abstract: We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs). Specifically, we focus on ergodic tabular MDPs with finite state and action spaces. Our analysis shows that the policy gradient iterates converge to the optimal policy at a sublinear rate of $O\left({\frac{1}{T}}\right),$ which translates to $O\left({\log(T)}\right)$ regret, where $T$ represents the number of iterations. Prior work on performance bounds for discounted …
abstract analysis arxiv context convergence cs.lg cs.sy decision eess.sy focus global gradient horizon markov policy processes shows spaces state tabular type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Business Intelligence Manager
@ Sanofi | Budapest
Principal Engineer, Data (Hybrid)
@ Homebase | Toronto, Ontario, Canada