April 5, 2024, 4:42 a.m. | Jiacai Liu, Wenye Li, Ke Wei

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.03372v1 Announce Type: cross
Abstract: Projected policy gradient under the simplex parameterization, policy gradient and natural policy gradient under the softmax parameterization, are fundamental algorithms in reinforcement learning. There have been a flurry of recent activities in studying these algorithms from the theoretical aspect. Despite this, their convergence behavior is still not fully understood, even given the access to exact policy evaluations. In this paper, we focus on the discounted MDP setting and conduct a systematic study of the aforementioned …

abstract algorithms analysis arxiv behavior convergence cs.lg elementary gradient math.oc natural policy reinforcement reinforcement learning softmax studying type

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States