Web: http://arxiv.org/abs/2112.03376

Jan. 31, 2022, 2:11 a.m. | Michael Rawson, Radu Balan

cs.LG updates on arXiv.org arxiv.org

Policy learning is a quickly growing area. As robotics and computers control
day-to-day life, their error rate needs to be minimized and controlled. There
are many policy learning methods and bandit methods with provable error rates
that accompany them. We show an error or regret bound and convergence of the
Deep Epsilon Greedy method which chooses actions with a neural network's
prediction. We also show that Epsilon Greedy method regret upper bound is
minimized with cubic root exploration. In experiments …

arxiv convergence deep learning policy

More from arxiv.org / cs.LG updates on arXiv.org

Data Operations Analyst

@ Mintel | Chicago

Data Analyst

@ PEAK6 | Austin, Chicago, Dallas, New York, Portland, Seattle

Data Scientist, Commercial Systems

@ Canonical Ltd. | Home based - EMEA

Sr. ML Data Associate, Information Data Operations

@ Amazon.com | US, CA, Virtual Location - California

Data Analyst (Europe & Australia)

@ Marley Spoon | Lisbon, Lisbon, Portugal - Remote

Healthcare ETL Developer

@ HealthVerity | United States