Web: http://arxiv.org/abs/2109.13595

Sept. 16, 2022, 1:12 a.m. | Lin Fan, Peter W. Glynn

cs.LG updates on arXiv.org arxiv.org

Much of the literature on optimal design of bandit algorithms is based on
minimization of expected regret. It is well known that designs that are optimal
over certain exponential families can achieve expected regret that grows
logarithmically in the number of arm plays, at a rate governed by the
Lai-Robbins lower bound. In this paper, we show that when one uses such
optimized designs, the regret distribution of the associated algorithms
necessarily has a very heavy tail, specifically, that of …

