May 25, 2022, 1:10 a.m. | Linrui zhang, Li Shen, Long Yang, Shixiang Chen, Bo Yuan, Xueqian Wang, Dacheng Tao

cs.LG updates on arXiv.org arxiv.org

Safe reinforcement learning aims to learn the optimal policy while satisfying
safety constraints, which is essential in real-world applications. However,
current algorithms still struggle for efficient policy updates with hard
constraint satisfaction. In this paper, we propose Penalized Proximal Policy
Optimization (P3O), which solves the cumbersome constrained policy iteration
via a single minimization of an equivalent unconstrained problem. Specifically,
P3O utilizes a simple-yet-effective penalty function to eliminate cost
constraints and removes the trust-region constraint by the clipped surrogate
objective. We …

arxiv learning optimization policy reinforcement reinforcement learning

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Intelligence Analyst

@ Rappi | COL-Bogotá

Applied Scientist II

@ Microsoft | Redmond, Washington, United States