Oct. 27, 2022, 1:13 a.m. | Thomas Kleine Buening, Aadirupa Saha

stat.ML updates on arXiv.org arxiv.org

We study the problem of non-stationary dueling bandits and provide the first
adaptive dynamic regret algorithm for this problem. The only two existing
attempts in this line of work fall short across multiple dimensions, including
pessimistic measures of non-stationary complexity and non-adaptive parameter
tuning that requires knowledge of the number of preference changes. We develop
an elimination-based rescheduling algorithm to overcome these shortcomings and
show a near-optimal $\tilde{O}(\sqrt{S^{\texttt{CW}} T})$ dynamic regret bound,
where $S^{\texttt{CW}}$ is the number of times the …

algorithm anaconda arxiv

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote