Web: http://arxiv.org/abs/1911.09101

Jan. 14, 2022, 2:10 a.m. | Santiago Paternain, Miguel Calvo-Fullana, Luiz F. O. Chamon, Alejandro Ribeiro

cs.LG updates on arXiv.org arxiv.org

In this paper, we study the learning of safe policies in the setting of
reinforcement learning problems. This is, we aim to control a Markov Decision
Process (MDP) of which we do not know the transition probabilities, but we have
access to sample trajectories through experience. We define safety as the agent
remaining in a desired safe set with high probability during the operation
time. We therefore consider a constrained MDP where the constraints are
probabilistic. Since there is no straightforward way to optimize the policy
with respect to the …

arxiv for learning reinforcement learning

Statistics and Computer Science Specialist

@ Hawk-Research | Remote

Data Scientist, Credit/Fraud Strategy

@ Fora Financial | New York City

Postdoctoral Research Associate - Biomedical Natural Language Processing and Deep Learning

@ Oak Ridge National Laboratory - Oak Ridge, TN | Oak Ridge, TN, United States

Senior Machine Learning / Computer Vision Engineer

@ Glass Imaging | Los Altos, CA

Research Scientist in Biomedical Natural Language Processing and Deep Learning

@ Oak Ridge National Laboratory | Oak Ridge, TN

W3-Professorship for Intelligent Energy Management

@ Universität Bayreuth | Bayreuth, Germany