April 10, 2024, 4:41 a.m. | Xuheng Li, Heyang Zhao, Quanquan Gu

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.06013v1 Announce Type: new
Abstract: Contextual dueling bandits, where a learner compares two options based on context and receives feedback indicating which was preferred, extends classic dueling bandits by incorporating contextual information for decision-making and preference learning. Several algorithms based on the upper confidence bound (UCB) have been proposed for linear contextual dueling bandits. However, no algorithm based on posterior sampling has been developed in this setting, despite the empirical success observed in traditional contextual bandits. In this paper, we …

abstract algorithms arxiv confidence context cs.lg decision feedback good information linear making math.oc sampling stat.ml type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Data Scientist, gTech Ads

@ Google | Mexico City, CDMX, Mexico

Lead, Data Analytics Operations

@ Zocdoc | Pune, Maharashtra, India