Feb. 20, 2024, 5:44 a.m. | Nai-Chieh Huang, Ping-Chun Hsieh, Kuo-Hao Ho, I-Chen Wu

cs.LG updates on arXiv.org arxiv.org

arXiv:2312.12065v2 Announce Type: replace
Abstract: Proximal Policy Optimization algorithm employing a clipped surrogate objective (PPO-Clip) is a prominent exemplar of the policy optimization methods. However, despite its remarkable empirical success, PPO-Clip lacks theoretical substantiation to date. In this paper, we contribute to the field by establishing the first global convergence results of a PPO-Clip variant in both tabular and neural function approximation settings. Our findings highlight the $O(1/\sqrt{T})$ min-iterate convergence rate specifically in the context of neural function approximation. We …

abstract algorithm arxiv clip convergence cs.ai cs.lg global optimization paper policy ppo success type

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Data Science Analyst

@ Mayo Clinic | AZ, United States

Sr. Data Scientist (Network Engineering)

@ SpaceX | Redmond, WA