### Web: http://arxiv.org/abs/2203.09251

June 20, 2022, 1:11 a.m. | Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann

In probably approximately correct (PAC) reinforcement learning (RL), an agent
is required to identify an $\epsilon$-optimal policy with probability
$1-\delta$. While minimax optimal algorithms exist for this problem, its
instance-dependent complexity remains elusive in episodic Markov decision
processes (MDPs). In this paper, we propose the first (nearly) matching upper
and lower bounds on the sample complexity of PAC RL in deterministic episodic
MDPs with finite state and action spaces. In particular, our bounds feature a
new notion of sub-optimality gap …

### Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

### Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

### Data Science Intern

@ NannyML | Remote

### Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

### Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

### Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY