all AI news
Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning. (arXiv:2304.07278v1 [cs.LG])
cs.LG updates on arXiv.org arxiv.org
This paper studies reward-agnostic exploration in reinforcement learning (RL)
-- a scenario where the learner is unware of the reward functions during the
exploration stage -- and designs an algorithm that improves over the state of
the art. More precisely, consider a finite-horizon non-stationary Markov
decision process with $S$ states, $A$ actions, and horizon length $H$, and
suppose that there are no more than a polynomial number of given reward
functions of interest. By collecting an order of \begin{align*}
\frac{SAH^3}{\varepsilon^2} …
algorithm art arxiv decision episodes exploration guidance markov minimax paper polynomial process reinforcement reinforcement learning stage state state of the art studies text