Web: http://arxiv.org/abs/2108.02717

June 23, 2022, 1:12 a.m. | Andrew Wagenmaker, Max Simchowitz, Kevin Jamieson

stat.ML updates on arXiv.org arxiv.org

The theory of reinforcement learning has focused on two fundamental problems:
achieving low regret, and identifying $\epsilon$-optimal policies. While a
simple reduction allows one to apply a low-regret algorithm to obtain an
$\epsilon$-optimal policy and achieve the worst-case optimal rate, it is
unknown whether low-regret algorithms can obtain the instance-optimal rate for
policy identification. We show this is not possible -- there exists a
fundamental tradeoff between achieving low regret and identifying an
$\epsilon$-optimal policy at the instance-optimal rate.


Motivated …

arxiv learning lg reinforcement reinforcement learning

More from arxiv.org / stat.ML updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY