June 20, 2022, 1:12 a.m. | Xuchuang Wang, Hong Xie, John C.S. Lui

stat.ML updates on arXiv.org arxiv.org

We generalize the multiple-play multi-armed bandits (MP-MAB) problem with a
shareable arm setting, in which several plays can share the same arm.
Furthermore, each shareable arm has a finite reward capacity and a ''per-load''
reward distribution, both of which are unknown to the learner. The reward from
a shareable arm is load-dependent, which is the "per-load" reward multiplying
either the number of plays pulling the arm, or its reward capacity when the
number of plays exceeds the capacity limit. When …

arxiv capacity lg stochastic

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Engineer

@ Parker | New York City

Sr. Data Analyst | Home Solutions

@ Three Ships | Raleigh or Charlotte, NC