Sept. 23, 2022, 1:11 a.m. | Yan Li, Tuo Zhao, Guanghui Lan

cs.LG updates on arXiv.org arxiv.org

We consider the problem of solving robust Markov decision process (MDP),
which involves a set of discounted, finite state, finite action space MDPs with
uncertain transition kernels. The goal of planning is to find a robust policy
that optimizes the worst-case values against the transition uncertainties, and
thus encompasses the standard MDP planning as a special case. For
$(\mathbf{s},\mathbf{a})$-rectangular uncertainty sets, we develop a
policy-based first-order method, namely the robust policy mirror descent
(RPMD), and establish an $\mathcal{O}(\log(1/\epsilon))$ and
$\mathcal{O}(1/\epsilon)$ …

arxiv decision markov optimization policy process

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Enterprise AI Architect

@ Oracle | Broomfield, CO, United States

Cloud Data Engineer France H/F (CDI - Confirmé)

@ Talan | Nantes, France