Web: http://arxiv.org/abs/2201.09802

Jan. 26, 2022, 2:11 a.m. | Yarden As, Ilnura Usmanova, Sebastian Curi, Andreas Krause

cs.LG updates on arXiv.org arxiv.org

Improving sample-efficiency and safety are crucial challenges when deploying
reinforcement learning in high-stakes real world applications. We propose
LAMBDA, a novel model-based approach for policy optimization in safety critical
tasks modeled via constrained Markov decision processes. Our approach utilizes
Bayesian world models, and harnesses the resulting uncertainty to maximize
optimistic upper bounds on the task objective, as well as pessimistic upper
bounds on the safety constraints. We demonstrate LAMBDA's state of the art
performance on the Safety-Gym benchmark suite in …

arxiv bayesian models optimization policy

More from arxiv.org / cs.LG updates on arXiv.org

Data Analyst, Credit Risk

@ Stripe | US Remote

Senior Data Engineer

@ Snyk | Cluj, Romania, or Remote

Senior Software Engineer (C++), Autonomy Visualization

@ Nuro, Inc. | Mountain View, California (HQ)

Machine Learning Intern (January 2023)

@ Cohere | Toronto, Palo Alto, San Francisco, London

Senior Machine Learning Engineer, Reinforcement Learning, Personalization

@ Spotify | New York, NY

AWS Data Engineer

@ ProCogia | Seattle