all AI news
PPO-Clip Attains Global Optimality: Towards Deeper Understandings of Clipping
Feb. 20, 2024, 5:44 a.m. | Nai-Chieh Huang, Ping-Chun Hsieh, Kuo-Hao Ho, I-Chen Wu
cs.LG updates on arXiv.org arxiv.org
Abstract: Proximal Policy Optimization algorithm employing a clipped surrogate objective (PPO-Clip) is a prominent exemplar of the policy optimization methods. However, despite its remarkable empirical success, PPO-Clip lacks theoretical substantiation to date. In this paper, we contribute to the field by establishing the first global convergence results of a PPO-Clip variant in both tabular and neural function approximation settings. Our findings highlight the $O(1/\sqrt{T})$ min-iterate convergence rate specifically in the context of neural function approximation. We …
abstract algorithm arxiv clip convergence cs.ai cs.lg global optimization paper policy ppo success type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Data Science Analyst
@ Mayo Clinic | AZ, United States
Sr. Data Scientist (Network Engineering)
@ SpaceX | Redmond, WA