all AI news
Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPs
March 12, 2024, 4:44 a.m. | Ian A. Kash, Lev Reyzin, Zishun Yu
cs.LG updates on arXiv.org arxiv.org
Abstract: Reinforcement learning generalizes multi-armed bandit problems with additional difficulties of a longer planning horizon and unknown transition kernel. We explore a black-box reduction from discounted infinite-horizon tabular reinforcement learning to multi-armed bandits, where, specifically, an independent bandit learner is placed in each state. We show that, under ergodicity and fast mixing assumptions, any slowly changing adversarial bandit algorithm achieving optimal regret in the adversarial bandit setting can also attain optimal expected regret in infinite-horizon discounted …
abstract adversarial algorithms arxiv box cs.lg explore horizon independent kernel multi-armed bandits planning reinforcement reinforcement learning show state tabular transition type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Data Engineer - New Graduate
@ Applied Materials | Milan,ITA
Lead Machine Learning Scientist
@ Biogen | Cambridge, MA, United States