May 11, 2022, 1:11 a.m. | Andreas Haupt, Aroon Narayanan

cs.LG updates on arXiv.org arxiv.org

Consider a bandit learning environment. We demonstrate that popular learning
algorithms such as Upper Confidence Band (UCB) and $\varepsilon$-Greedy exhibit
risk aversion: when presented with two arms of the same expectation, but
different variance, the algorithms tend to not choose the riskier, i.e. higher
variance, arm. We prove that $\varepsilon$-Greedy chooses the risky arm with
probability tending to $0$ when faced with a deterministic and a
Rademacher-distributed arm. We show experimentally that UCB also shows
risk-averse behavior, and that risk …

algorithms application arxiv learning recommendation recommendation systems risk systems

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Technology Consultant Master Data Management (w/m/d)

@ SAP | Walldorf, DE, 69190

Research Engineer, Computer Vision, Google Research

@ Google | Nairobi, Kenya