March 7, 2024, 5:42 a.m. | Amaury Gouverneur, Borja Rodr\'iguez-G\'alvez, Tobias J. Oechtering, Mikael Skoglund

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.03361v1 Announce Type: cross
Abstract: This paper studies the Bayesian regret of a variant of the Thompson-Sampling algorithm for bandit problems. It builds upon the information-theoretic framework of [Russo and Van Roy, 2015] and, more specifically, on the rate-distortion analysis from [Dong and Van Roy, 2020], where they proved a bound with regret rate of $O(d\sqrt{T \log(T)})$ for the $d$-dimensional linear bandit setting. We focus on bandit problems with a metric action space and, using a chaining argument, we establish …

abstract algorithm analysis arxiv bayesian cs.lg framework information linear paper rate sampling stat.ml studies the information type van

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US