Web: http://arxiv.org/abs/2201.11927

Jan. 31, 2022, 2:11 a.m. | Zuxin Liu, Zhepeng Cen, Vladislav Isenbaev, Wei Liu, Zhiwei Steven Wu, Bo Li, Ding Zhao

cs.LG updates on arXiv.org arxiv.org

Safe reinforcement learning (RL) aims to learn policies that satisfy certain
constraints before deploying to safety-critical applications. Primal-dual as a
prevalent constrained optimization framework suffers from instability issues
and lacks optimality guarantees. This paper overcomes the issues from a novel
probabilistic inference perspective and proposes an Expectation-Maximization
style approach to learn safe policy. We show that the safe RL problem can be
decomposed to 1) a convex optimization phase with a non-parametric variational
distribution and 2) a supervised learning phase. …

arxiv learning optimization policy reinforcement learning

More from arxiv.org / cs.LG updates on arXiv.org

Senior Data Analyst

@ Fanatics Inc | Remote - New York

Data Engineer - Search

@ Cytora | United Kingdom - Remote

Product Manager, Technical - Data Infrastructure and Streaming

@ Nubank | Berlin

Postdoctoral Fellow: ML for autonomous materials discovery

@ Lawrence Berkeley National Lab | Berkeley, CA

Principal Data Scientist

@ Zuora | Remote

Data Engineer

@ Veeva Systems | Pennsylvania - Fort Washington