Aug. 29, 2022, 1:12 a.m. | Peng Zhao, Long-Fei Li, Zhi-Hua Zhou

stat.ML updates on arXiv.org arxiv.org

We investigate online Markov Decision Processes (MDPs) with adversarially
changing loss functions and known transitions. We choose dynamic regret as the
performance measure, defined as the performance difference between the learner
and any sequence of feasible changing policies. The measure is strictly
stronger than the standard static regret that benchmarks the learner's
performance with a fixed compared policy. We consider three foundational models
of online MDPs, including episodic loop-free Stochastic Shortest Path (SSP),
episodic SSP, and infinite-horizon MDPs. For these …

arxiv decision lg markov processes

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US