all AI news
First-order Policy Optimization for Robust Markov Decision Process. (arXiv:2209.10579v1 [cs.LG])
Sept. 23, 2022, 1:11 a.m. | Yan Li, Tuo Zhao, Guanghui Lan
cs.LG updates on arXiv.org arxiv.org
We consider the problem of solving robust Markov decision process (MDP),
which involves a set of discounted, finite state, finite action space MDPs with
uncertain transition kernels. The goal of planning is to find a robust policy
that optimizes the worst-case values against the transition uncertainties, and
thus encompasses the standard MDP planning as a special case. For
$(\mathbf{s},\mathbf{a})$-rectangular uncertainty sets, we develop a
policy-based first-order method, namely the robust policy mirror descent
(RPMD), and establish an $\mathcal{O}(\log(1/\epsilon))$ and
$\mathcal{O}(1/\epsilon)$ …
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Enterprise AI Architect
@ Oracle | Broomfield, CO, United States
Cloud Data Engineer France H/F (CDI - Confirmé)
@ Talan | Nantes, France