all AI news
Transfer in Sequential Multi-armed Bandits via Reward Samples
March 20, 2024, 4:41 a.m. | Rahul N R, Vaibhav Katewa
cs.LG updates on arXiv.org arxiv.org
Abstract: We consider a sequential stochastic multi-armed bandit problem where the agent interacts with bandit over multiple episodes. The reward distribution of the arms remain constant throughout an episode but can change over different episodes. We propose an algorithm based on UCB to transfer the reward samples from the previous episodes and improve the cumulative regret performance over all the episodes. We provide regret analysis and empirical results for our algorithm, which show significant improvement over …
abstract agent algorithm arxiv change cs.lg distribution episodes multi-armed bandits multiple samples stat.ml stochastic transfer type via
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Principal Data Engineering Manager
@ Microsoft | Redmond, Washington, United States
Machine Learning Engineer
@ Apple | San Diego, California, United States