all AI news
Batched Nonparametric Contextual Bandits
Feb. 28, 2024, 5:43 a.m. | Rong Jiang, Cong Ma
cs.LG updates on arXiv.org arxiv.org
Abstract: We study nonparametric contextual bandits under batch constraints, where the expected reward for each action is modeled as a smooth function of covariates, and the policy updates are made at the end of each batch of observations. We establish a minimax regret lower bound for this setting and propose Batched Successive Elimination with Dynamic Binning (BaSEDB) that achieves optimal regret (up to logarithmic factors). In essence, BaSEDB dynamically splits the covariate space into smaller bins, …
abstract arxiv constraints cs.lg function math.st minimax policy stat.ml stat.th study the end type updates
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Data Engineer - AWS
@ 3Pillar Global | Costa Rica
Cost Controller/ Data Analyst - India
@ John Cockerill | Mumbai, India, India, India