Web: http://arxiv.org/abs/2206.07166

June 16, 2022, 1:10 a.m. | Shentao Yang, Yihao Feng, Shujian Zhang, Mingyuan Zhou

cs.LG updates on arXiv.org arxiv.org

Offline reinforcement learning (RL) extends the paradigm of classical RL
algorithms to purely learning from static datasets, without interacting with
the underlying environment during the learning process. A key challenge of
offline RL is the instability of policy training, caused by the mismatch
between the distribution of the offline data and the undiscounted stationary
state-action distribution of the learned policy. To avoid the detrimental
impact of distribution mismatch, we regularize the undiscounted stationary
distribution of the current policy towards the …

arxiv distribution learning lg model policy reinforcement reinforcement learning

More from arxiv.org / cs.LG updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY