all AI news
Constrained Variational Policy Optimization for Safe Reinforcement Learning. (arXiv:2201.11927v1 [cs.LG])
Web: http://arxiv.org/abs/2201.11927
Jan. 31, 2022, 2:11 a.m. | Zuxin Liu, Zhepeng Cen, Vladislav Isenbaev, Wei Liu, Zhiwei Steven Wu, Bo Li, Ding Zhao
cs.LG updates on arXiv.org arxiv.org
Safe reinforcement learning (RL) aims to learn policies that satisfy certain
constraints before deploying to safety-critical applications. Primal-dual as a
prevalent constrained optimization framework suffers from instability issues
and lacks optimality guarantees. This paper overcomes the issues from a novel
probabilistic inference perspective and proposes an Expectation-Maximization
style approach to learn safe policy. We show that the safe RL problem can be
decomposed to 1) a convex optimization phase with a non-parametric variational
distribution and 2) a supervised learning phase. …
More from arxiv.org / cs.LG updates on arXiv.org
Latest AI/ML/Big Data Jobs
Senior Data Analyst
@ Fanatics Inc | Remote - New York
Data Engineer - Search
@ Cytora | United Kingdom - Remote
Product Manager, Technical - Data Infrastructure and Streaming
@ Nubank | Berlin
Postdoctoral Fellow: ML for autonomous materials discovery
@ Lawrence Berkeley National Lab | Berkeley, CA
Principal Data Scientist
@ Zuora | Remote
Data Engineer
@ Veeva Systems | Pennsylvania - Fort Washington