Web: http://arxiv.org/abs/2203.09251

June 20, 2022, 1:11 a.m. | Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann

cs.LG updates on arXiv.org arxiv.org

In probably approximately correct (PAC) reinforcement learning (RL), an agent
is required to identify an $\epsilon$-optimal policy with probability
$1-\delta$. While minimax optimal algorithms exist for this problem, its
instance-dependent complexity remains elusive in episodic Markov decision
processes (MDPs). In this paper, we propose the first (nearly) matching upper
and lower bounds on the sample complexity of PAC RL in deterministic episodic
MDPs with finite state and action spaces. In particular, our bounds feature a
new notion of sub-optimality gap …

arxiv learning lg near reinforcement reinforcement learning

More from arxiv.org / cs.LG updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY