all AI news
Constrained Variational Policy Optimization for Safe Reinforcement Learning. (arXiv:2201.11927v3 [cs.LG] UPDATED)
Web: http://arxiv.org/abs/2201.11927
cs.LG updates on arXiv.org arxiv.org
Safe reinforcement learning (RL) aims to learn policies that satisfy certain
constraints before deploying them to safety-critical applications. Previous
primal-dual style approaches suffer from instability issues and lack optimality
guarantees. This paper overcomes the issues from the perspective of
probabilistic inference. We introduce a novel Expectation-Maximization approach
to naturally incorporate constraints during the policy learning: 1) a provable
optimal non-parametric variational distribution could be computed in closed
form after a convex optimization (E-step); 2) the policy parameter is improved
within …
arxiv learning lg optimization policy reinforcement reinforcement learning