Web: http://arxiv.org/abs/2106.08771

May 4, 2022, 1:12 a.m. | Nicolas Gast (POLARIS), Bruno Gaujal (POLARIS), Kimang Khun (POLARIS)

cs.LG updates on arXiv.org arxiv.org

We study learning algorithms for the classical Markovian bandit problem with
discount. We explain how to adapt PSRL [24] and UCRL2 [2] to exploit the
problem structure. These variants are called MB-PSRL and MB-UCRL2. While the
regret bound and runtime of vanilla implementations of PSRL and UCRL2 are
exponential in the number of bandits, we show that the episodic regret of
MB-PSRL and MB-UCRL2 is $\tilde{O}(S\sqrt{nK})$ where $K$ is the number of
episodes, $n$ is the number of bandits and …

arxiv learning optimism posterior reinforcement reinforcement learning sampling scalable

More from arxiv.org / cs.LG updates on arXiv.org

Data Analyst, Patagonia Action Works

@ Patagonia | Remote

Data & Insights Strategy & Innovation General Manager

@ Chevron Services Company, a division of Chevron U.S.A Inc. | Houston, TX

Faculty members in Research areas such as Bayesian and Spatial Statistics; Data Privacy and Security; AI/ML; NLP; Image and Video Data Analysis

@ Ahmedabad University | Ahmedabad, India

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL

Assistant/Associate Professor of the Practice in Business Analytics

@ Georgetown University McDonough School of Business | Washington DC