all AI news
Learning Contextual Bandits Through Perturbed Rewards. (arXiv:2201.09910v1 [cs.LG])
Web: http://arxiv.org/abs/2201.09910
Jan. 26, 2022, 2:10 a.m. | Yiling Jia, Weitong Zhang, Dongruo Zhou, Quanquan Gu, Hongning Wang
cs.LG updates on arXiv.org arxiv.org
Thanks to the power of representation learning, neural contextual bandit
algorithms demonstrate remarkable performance improvement against their
classical counterparts. But because their exploration has to be performed in
the entire neural network parameter space to obtain nearly optimal regret, the
resulting computational cost is prohibitively high. We perturb the rewards when
updating the neural network to eliminate the need of explicit exploration and
the corresponding computational overhead. We prove that a
$\tilde{O}(\tilde{d}\sqrt{T})$ regret upper bound is still achievable under
standard …
More from arxiv.org / cs.LG updates on arXiv.org
Latest AI/ML/Big Data Jobs
Data Scientist
@ Fluent, LLC | Boca Raton, Florida, United States
Big Data ETL Engineer
@ Binance.US | Vancouver
Data Scientist / Data Engineer
@ Kin + Carta | Chicago
Data Engineer
@ Craft | Warsaw, Masovian Voivodeship, Poland
Senior Manager, Data Analytics Audit
@ Affirm | Remote US
Data Scientist - Nationwide Opportunities, AWS Professional Services
@ Amazon.com | US, NC, Virtual Location - N Carolina