March 12, 2024, 4:42 a.m. | Navdeep Kumar, Yashaswini Murthy, Itai Shufaro, Kfir Y. Levy, R. Srikant, Shie Mannor

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.06806v1 Announce Type: new
Abstract: We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs). Specifically, we focus on ergodic tabular MDPs with finite state and action spaces. Our analysis shows that the policy gradient iterates converge to the optimal policy at a sublinear rate of $O\left({\frac{1}{T}}\right),$ which translates to $O\left({\log(T)}\right)$ regret, where $T$ represents the number of iterations. Prior work on performance bounds for discounted …

abstract analysis arxiv context convergence cs.lg cs.sy decision eess.sy focus global gradient horizon markov policy processes shows spaces state tabular type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Intelligence Manager

@ Sanofi | Budapest

Principal Engineer, Data (Hybrid)

@ Homebase | Toronto, Ontario, Canada