March 26, 2024, 4:41 a.m. | Abhijit Mazumdar, Rafal Wisniewski, Manuela L. Bujorianu

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.15928v1 Announce Type: new
Abstract: In this paper, we present an online reinforcement learning algorithm for constrained Markov decision processes with a safety constraint. Despite the necessary attention of the scientific community, considering stochastic stopping time, the problem of learning optimal policy without violating safety constraints during the learning phase is yet to be addressed. To this end, we propose an algorithm based on linear programming that does not require a process model. We show that the learned policy is …

abstract algorithm arxiv attention community constraints cs.lg decision markov online reinforcement learning paper policy processes reinforcement reinforcement learning safety scientific stochastic type

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Scientist

@ ITE Management | New York City, United States