April 28, 2022, 1:11 a.m. | Zongqi Wan, Xiaoming Sun, Jialin Zhang

cs.LG updates on arXiv.org arxiv.org

We study the adversarial bandit problem with composite anonymous delayed
feedback. In this setting, losses of an action are split into $d$ components,
spreading over consecutive rounds after the action is chosen. And in each
round, the algorithm observes the aggregation of losses that come from the
latest $d$ rounds. Previous works focus on oblivious adversarial setting, while
we investigate the harder non-oblivious setting. We show non-oblivious setting
incurs $\Omega(T)$ pseudo regret even when the loss sequence is bounded memory. …

anonymous arxiv feedback memory

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior AI & Data Engineer

@ Bertelsmann | Kuala Lumpur, 14, MY, 50400

Analytics Engineer

@ Reverse Tech | Philippines - Remote