all AI news
Partial Information as Full: Reward Imputation with Sketching in Bandits. (arXiv:2210.06719v2 [cs.LG] UPDATED)
Oct. 21, 2022, 1:13 a.m. | Xiao Zhang, Ninglu Shao, Zihua Si, Jun Xu, Wenhan Wang, Hanjing Su, Ji-Rong Wen
cs.LG updates on arXiv.org arxiv.org
We focus on the setting of contextual batched bandit (CBB), where a batch of
rewards is observed from the environment in each episode. But the rewards of
the non-executed actions are unobserved (i.e., partial-information feedbacks).
Existing approaches for CBB usually ignore the rewards of the non-executed
actions, resulting in feedback information being underutilized. In this paper,
we propose an efficient reward imputation approach using sketching for CBB,
which completes the unobserved rewards with the imputed rewards approximating
the full-information feedbacks. …
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Field Sample Specialist (Air Sampling) - Eurofins Environment Testing – Pueblo, CO
@ Eurofins | Pueblo, CO, United States
Camera Perception Engineer
@ Meta | Sunnyvale, CA