Feb. 28, 2024, 5:43 a.m. | Rong Jiang, Cong Ma

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.17732v1 Announce Type: cross
Abstract: We study nonparametric contextual bandits under batch constraints, where the expected reward for each action is modeled as a smooth function of covariates, and the policy updates are made at the end of each batch of observations. We establish a minimax regret lower bound for this setting and propose Batched Successive Elimination with Dynamic Binning (BaSEDB) that achieves optimal regret (up to logarithmic factors). In essence, BaSEDB dynamically splits the covariate space into smaller bins, …

abstract arxiv constraints cs.lg function math.st minimax policy stat.ml stat.th study the end type updates

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Engineer - AWS

@ 3Pillar Global | Costa Rica

Cost Controller/ Data Analyst - India

@ John Cockerill | Mumbai, India, India, India