Web: http://arxiv.org/abs/2202.11474

June 20, 2022, 1:12 a.m. | Shuang Wu, Chi-Hua Wang, Yuantong Li, Guang Cheng

stat.ML updates on arXiv.org arxiv.org

We propose a new bootstrap-based online algorithm for stochastic linear
bandit problems. The key idea is to adopt residual bootstrap exploration, in
which the agent estimates the next step reward by re-sampling the residuals of
mean reward estimate. Our algorithm, residual bootstrap exploration for
stochastic linear bandit (\texttt{LinReBoot}), estimates the linear reward from
its re-sampling distribution and pulls the arm with the highest reward
estimate. In particular, we contribute a theoretical framework to demystify
residual bootstrap-based exploration mechanisms in stochastic …

arxiv exploration linear ml stochastic

