Web: http://arxiv.org/abs/2112.13838

June 17, 2022, 1:12 a.m. | Joe Suk, Samory Kpotufe

stat.ML updates on arXiv.org arxiv.org

In bandit with distribution shifts, one aims to automatically adapt to
unknown changes in reward distribution, and restart exploration when necessary.
While this problem has been studied for many years, a recent breakthrough of
Auer et al. (2018, 2019) provides the first adaptive procedure to guarantee an
optimal (dynamic) regret $\sqrt{LT}$, for $T$ rounds, and an unknown number $L$
of changes. However, while this rate is tight in the worst case, it remained
open whether faster rates are possible, without …

arm arxiv lg tracking

More from arxiv.org / stat.ML updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY