Nov. 16, 2022, 2:12 a.m. | Guanhua Fang, Ping Li, Gennady Samorodnitsky

cs.LG updates on arXiv.org arxiv.org

We study an important variant of the stochastic multi-armed bandit (MAB)
problem, which takes penalization into consideration. Instead of directly
maximizing cumulative expected reward, we need to balance between the total
reward and fairness level. In this paper, we present some new insights in MAB
and formulate the problem in the penalization framework, where rigorous
penalized regret can be well defined and more sophisticated regret analysis is
possible. Under such a framework, we propose a hard-threshold UCB-like
algorithm, which enjoys …

arxiv multi-armed bandits stochastic

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US