March 12, 2024, 4:44 a.m. | Ian A. Kash, Lev Reyzin, Zishun Yu

cs.LG updates on arXiv.org arxiv.org

arXiv:2205.09056v3 Announce Type: replace
Abstract: Reinforcement learning generalizes multi-armed bandit problems with additional difficulties of a longer planning horizon and unknown transition kernel. We explore a black-box reduction from discounted infinite-horizon tabular reinforcement learning to multi-armed bandits, where, specifically, an independent bandit learner is placed in each state. We show that, under ergodicity and fast mixing assumptions, any slowly changing adversarial bandit algorithm achieving optimal regret in the adversarial bandit setting can also attain optimal expected regret in infinite-horizon discounted …

abstract adversarial algorithms arxiv box cs.lg explore horizon independent kernel multi-armed bandits planning reinforcement reinforcement learning show state tabular transition type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Engineer - New Graduate

@ Applied Materials | Milan,ITA

Lead Machine Learning Scientist

@ Biogen | Cambridge, MA, United States