all AI news
Dynamic Regret of Online Markov Decision Processes. (arXiv:2208.12483v1 [cs.LG])
Aug. 29, 2022, 1:12 a.m. | Peng Zhao, Long-Fei Li, Zhi-Hua Zhou
stat.ML updates on arXiv.org arxiv.org
We investigate online Markov Decision Processes (MDPs) with adversarially
changing loss functions and known transitions. We choose dynamic regret as the
performance measure, defined as the performance difference between the learner
and any sequence of feasible changing policies. The measure is strictly
stronger than the standard static regret that benchmarks the learner's
performance with a fixed compared policy. We consider three foundational models
of online MDPs, including episodic loop-free Stochastic Shortest Path (SSP),
episodic SSP, and infinite-horizon MDPs. For these …
More from arxiv.org / stat.ML updates on arXiv.org
Jobs in AI, ML, Big Data
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US