all AI news
Offline Reinforcement Learning Under Value and Density-Ratio Realizability: The Power of Gaps. (arXiv:2203.13935v3 [cs.LG] UPDATED)
Web: http://arxiv.org/abs/2203.13935
stat.ML updates on arXiv.org arxiv.org
We consider a challenging theoretical problem in offline reinforcement
learning (RL): obtaining sample-efficiency guarantees with a dataset lacking
sufficient coverage, under only realizability-type assumptions for the function
approximators. While the existing theory has addressed learning under
realizability and under non-exploratory data separately, no work has been able
to address both simultaneously (except for a concurrent work which we compare
in detail). Under an additional gap assumption, we provide guarantees to a
simple pessimistic algorithm based on a version space formed …
arxiv learning lg power reinforcement reinforcement learning value