all AI news
Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs. (arXiv:2010.00587v3 [cs.LG] UPDATED)
Jan. 4, 2022, 2:10 a.m. | Jiafan He, Dongruo Zhou, Quanquan Gu
cs.LG updates on arXiv.org arxiv.org
We study the reinforcement learning problem for discounted Markov Decision
Processes (MDPs) under the tabular setting. We propose a model-based algorithm
named UCBVI-$\gamma$, which is based on the \emph{optimism in the face of
uncertainty principle} and the Bernstein-type bonus. We show that
UCBVI-$\gamma$ achieves an $\tilde{O}\big({\sqrt{SAT}}/{(1-\gamma)^{1.5}}\big)$
regret, where $S$ is the number of states, $A$ is the number of actions,
$\gamma$ is the discount factor and $T$ is the number of steps. In addition, we
construct a class of hard …
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior AI & Data Engineer
@ Bertelsmann | Kuala Lumpur, 14, MY, 50400
Analytics Engineer
@ Reverse Tech | Philippines - Remote