Web: http://arxiv.org/abs/2201.09910

Jan. 26, 2022, 2:10 a.m. | Yiling Jia, Weitong Zhang, Dongruo Zhou, Quanquan Gu, Hongning Wang

cs.LG updates on arXiv.org arxiv.org

Thanks to the power of representation learning, neural contextual bandit
algorithms demonstrate remarkable performance improvement against their
classical counterparts. But because their exploration has to be performed in
the entire neural network parameter space to obtain nearly optimal regret, the
resulting computational cost is prohibitively high. We perturb the rewards when
updating the neural network to eliminate the need of explicit exploration and
the corresponding computational overhead. We prove that a
$\tilde{O}(\tilde{d}\sqrt{T})$ regret upper bound is still achievable under
standard …

arxiv learning

More from arxiv.org / cs.LG updates on arXiv.org

Data Scientist

@ Fluent, LLC | Boca Raton, Florida, United States

Big Data ETL Engineer

@ Binance.US | Vancouver

Data Scientist / Data Engineer

@ Kin + Carta | Chicago

Data Engineer

@ Craft | Warsaw, Masovian Voivodeship, Poland

Senior Manager, Data Analytics Audit

@ Affirm | Remote US

Data Scientist - Nationwide Opportunities, AWS Professional Services

@ Amazon.com | US, NC, Virtual Location - N Carolina