Conservative Distributional Reinforcement Learning with Safety Constraints. (arXiv:2201.07286v1 [cs.LG]) | allainews.com

Jan. 20, 2022, 2:10 a.m. | Hengrui Zhang, Youfang Lin, Sheng Han, Shuo Wang, Kai Lv

cs.LG updates on arXiv.org arxiv.org

Safety exploration can be regarded as a constrained Markov decision problem
where the expected long-term cost is constrained. Previous off-policy
algorithms convert the constrained optimization problem into the corresponding
unconstrained dual problem by introducing the Lagrangian relaxation technique.
However, the cost function of the above algorithms provides inaccurate
estimations and causes the instability of the Lagrange multiplier learning. In
this paper, we present a novel off-policy reinforcement learning algorithm
called Conservative Distributional Maximum a Posteriori Policy Optimization
(CDMPO). At first, …

arxiv learning reinforcement learning safety

More from arxiv.org / cs.LG updates on arXiv.org

PPNet: A Two-Stage Neural Network for End-to-end Path Planning 27 minutes ago | arxiv.org

abstract arxiv cs.ai cs.lg +14

Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections 27 minutes ago | arxiv.org

abstract arxiv cs.ai cs.dc +16

From Reactive to Proactive Volatility Modeling with Hemisphere Neural Networks 27 minutes ago | arxiv.org

abstract architecture arxiv context +23

DGR: Tackling Drifted and Correlated Noise in Quantum Error Correction via Decoding Graph Re-weighting 27 minutes ago | arxiv.org

abstract applications arxiv cs.ar +18

A Single-Loop Algorithm for Decentralized Bilevel Optimization 27 minutes ago | arxiv.org

abstract algorithm applications arxiv +13

Watch Out! Simple Horizontal Class Backdoors Can Trivially Evade Defenses 27 minutes ago | arxiv.org

abstract arxiv attacks backdoor +13

Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples 27 minutes ago | arxiv.org

abstract alpha arxiv cs.cr +16

CLEANing Cygnus A deep and fast with R2D2 27 minutes ago | arxiv.org

abstract arxiv astronomy astro-ph.im +17

Feature Imitating Networks Enhance The Performance, Reliability And Speed Of Deep Learning On Biomedical Image … 27 minutes ago | arxiv.org

abstract arxiv biomedical cs.cv +21

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

IT Data Engineer

@ Procter & Gamble | BUCHAREST OFFICE

View on ai-jobs.net

Data Engineer (w/m/d)

@ IONOS | Deutschland - Remote

View on ai-jobs.net

Staff Data Science Engineer, SMAI

@ Micron Technology | Hyderabad - Phoenix Aquila, India

View on ai-jobs.net

Academically & Intellectually Gifted Teacher (AIG - Elementary)

@ Wake County Public School System | Cary, NC, United States

View on ai-jobs.net