Oct. 11, 2022, 1:13 a.m. | Tomer Gafni, Michal Yemini, Kobi Cohen

cs.LG updates on arXiv.org arxiv.org

We consider an extension to the restless multi-armed bandit (RMAB) problem
with unknown arm dynamics, where an unknown exogenous global Markov process
governs the rewards distribution of each arm. Under each global state, the
rewards process of each arm evolves according to an unknown Markovian rule,
which is non-identical among different arms. At each time, a player chooses an
arm out of N arms to play, and receives a random reward from a finite set of
reward states. The arms …

arxiv exogenous global markov multi-armed bandits process

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead Data Engineer

@ WorkMoney | New York City, United States - Remote