Web: http://arxiv.org/abs/2201.11927

June 20, 2022, 1:11 a.m. | Zuxin Liu, Zhepeng Cen, Vladislav Isenbaev, Wei Liu, Zhiwei Steven Wu, Bo Li, Ding Zhao

cs.LG updates on arXiv.org arxiv.org

Safe reinforcement learning (RL) aims to learn policies that satisfy certain
constraints before deploying them to safety-critical applications. Previous
primal-dual style approaches suffer from instability issues and lack optimality
guarantees. This paper overcomes the issues from the perspective of
probabilistic inference. We introduce a novel Expectation-Maximization approach
to naturally incorporate constraints during the policy learning: 1) a provable
optimal non-parametric variational distribution could be computed in closed
form after a convex optimization (E-step); 2) the policy parameter is improved
within …

arxiv learning lg optimization policy reinforcement reinforcement learning

More from arxiv.org / cs.LG updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY