April 2, 2024, 7:44 p.m. | Qi Cai, Zhuoran Yang, Chi Jin, Zhaoran Wang

cs.LG updates on arXiv.org arxiv.org

arXiv:1912.05830v4 Announce Type: replace
Abstract: While policy-based reinforcement learning (RL) achieves tremendous successes in practice, it is significantly less understood in theory, especially compared with value-based RL. In particular, it remains elusive how to design a provably efficient policy optimization algorithm that incorporates exploration. To bridge such a gap, this paper proposes an Optimistic variant of the Proximal Policy Optimization algorithm (OPPO), which follows an ``optimistic version'' of the policy gradient direction. This paper proves that, in the problem of …

abstract algorithm arxiv bridge cs.lg design exploration gap math.oc optimization paper policy practice reinforcement reinforcement learning stat.ml theory type value

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Science Analyst- ML/DL/LLM

@ Mayo Clinic | Jacksonville, FL, United States

Machine Learning Research Scientist, Robustness and Uncertainty

@ Nuro, Inc. | Mountain View, California (HQ)