Jan. 26, 2022, 2:11 a.m. | Yarden As, Ilnura Usmanova, Sebastian Curi, Andreas Krause

cs.LG updates on arXiv.org

Improving sample-efficiency and safety are crucial challenges when deploying
reinforcement learning in high-stakes real world applications. We propose
LAMBDA, a novel model-based approach for policy optimization in safety critical
tasks modeled via constrained Markov decision processes. Our approach utilizes
Bayesian world models, and harnesses the resulting uncertainty to maximize
optimistic upper bounds on the task objective, as well as pessimistic upper
bounds on the safety constraints. We demonstrate LAMBDA's state of the art
performance on the Safety-Gym benchmark suite in …

