all AI news
Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning. (arXiv:2210.13846v1 [cs.LG])
cs.LG updates on arXiv.org arxiv.org
Offline reinforcement learning, by learning from a fixed dataset, makes it
possible to learn agent behaviors without interacting with the environment.
However, depending on the quality of the offline dataset, such pre-trained
agents may have limited performance and would further need to be fine-tuned
online by interacting with the environment. During online fine-tuning, the
performance of the pre-trained agent may collapse quickly due to the sudden
distribution shift from offline to online data. While constraints enforced by
offline RL methods …
arxiv behavior cloning offline online reinforcement learning regularization reinforcement reinforcement learning