Web: http://arxiv.org/abs/2203.09251

June 20, 2022, 1:11 a.m. | Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann

cs.LG updates on arXiv.org arxiv.org

In probably approximately correct (PAC) reinforcement learning (RL), an agent
is required to identify an $\epsilon$-optimal policy with probability
$1-\delta$. While minimax optimal algorithms exist for this problem, its
instance-dependent complexity remains elusive in episodic Markov decision
processes (MDPs). In this paper, we propose the first (nearly) matching upper
and lower bounds on the sample complexity of PAC RL in deterministic episodic
MDPs with finite state and action spaces. In particular, our bounds feature a
new notion of sub-optimality gap …

arxiv learning lg near reinforcement reinforcement learning

