Web: http://arxiv.org/abs/2201.11965

Jan. 31, 2022, 2:11 a.m. | Yuhao Ding, Javad Lavaei

cs.LG updates on arXiv.org arxiv.org

We consider primal-dual-based reinforcement learning (RL) in episodic
constrained Markov decision processes (CMDPs) with non-stationary objectives
and constraints, which play a central role in ensuring the safety of RL in
time-varying environments. In this problem, the reward/utility functions and
the state transition functions are both allowed to vary arbitrarily over time
as long as their cumulative variations do not exceed certain known variation
budgets. Designing safe RL algorithms in time-varying environments is
particularly challenging because of the need to integrate …

arxiv learning reinforcement learning

More from arxiv.org / cs.LG updates on arXiv.org

Data Analyst, Credit Risk

@ Stripe | US Remote

Senior Data Engineer

@ Snyk | Cluj, Romania, or Remote

Senior Software Engineer (C++), Autonomy Visualization

@ Nuro, Inc. | Mountain View, California (HQ)

Machine Learning Intern (January 2023)

@ Cohere | Toronto, Palo Alto, San Francisco, London

Senior Machine Learning Engineer, Reinforcement Learning, Personalization

@ Spotify | New York, NY

AWS Data Engineer

@ ProCogia | Seattle