Constrained Update Projection Approach to Safe Policy Optimization. (arXiv:2209.07089v1 [cs.LG]) | allainews.com

Sept. 16, 2022, 1:11 a.m. | Long Yang, Jiaming Ji, Juntao Dai, Linrui Zhang, Binbin Zhou, Pengfei Li, Yaodong Yang, Gang Pan

cs.LG updates on arXiv.org arxiv.org

Safe reinforcement learning (RL) studies problems where an intelligent agent
has to not only maximize reward but also avoid exploring unsafe areas. In this
study, we propose CUP, a novel policy optimization method based on Constrained
Update Projection framework that enjoys rigorous safety guarantee. Central to
our CUP development is the newly proposed surrogate functions along with the
performance bound. Compared to previous safe RL methods, CUP enjoys the
benefits of 1) CUP generalizes the surrogate functions to generalized advantage …

arxiv optimization policy projection

More from arxiv.org / cs.LG updates on arXiv.org

PPNet: A Two-Stage Neural Network for End-to-end Path Planning 1 day, 1 hour ago | arxiv.org

abstract arxiv cs.ai cs.lg +14

Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections 1 day, 1 hour ago | arxiv.org

abstract arxiv cs.ai cs.dc +16

From Reactive to Proactive Volatility Modeling with Hemisphere Neural Networks 1 day, 1 hour ago | arxiv.org

abstract architecture arxiv context +23

DGR: Tackling Drifted and Correlated Noise in Quantum Error Correction via Decoding Graph Re-weighting 1 day, 1 hour ago | arxiv.org

abstract applications arxiv cs.ar +18

A Single-Loop Algorithm for Decentralized Bilevel Optimization 1 day, 1 hour ago | arxiv.org

abstract algorithm applications arxiv +13

Watch Out! Simple Horizontal Class Backdoors Can Trivially Evade Defenses 1 day, 1 hour ago | arxiv.org

abstract arxiv attacks backdoor +13

Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples 1 day, 1 hour ago | arxiv.org

abstract alpha arxiv cs.cr +16

CLEANing Cygnus A deep and fast with R2D2 1 day, 1 hour ago | arxiv.org

abstract arxiv astronomy astro-ph.im +17

Feature Imitating Networks Enhance The Performance, Reliability And Speed Of Deep Learning On Biomedical Image … 1 day, 1 hour ago | arxiv.org

abstract arxiv biomedical cs.cv +21

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Data Engineer

@ Parker | New York City

View on ai-jobs.net

Sr. Data Analyst | Home Solutions

@ Three Ships | Raleigh or Charlotte, NC

View on ai-jobs.net