### Web: http://arxiv.org/abs/2112.13838

June 17, 2022, 1:12 a.m. | Joe Suk, Samory Kpotufe

In bandit with distribution shifts, one aims to automatically adapt to
unknown changes in reward distribution, and restart exploration when necessary.
While this problem has been studied for many years, a recent breakthrough of
Auer et al. (2018, 2019) provides the first adaptive procedure to guarantee an
optimal (dynamic) regret $\sqrt{LT}$, for $T$ rounds, and an unknown number $L$
of changes. However, while this rate is tight in the worst case, it remained
open whether faster rates are possible, without …

### Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

### Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

### Data Science Intern

@ NannyML | Remote

### Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

### Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

### Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY