March 11, 2024, 4:42 a.m. | Mehdi Jafarnia-Jahromi, Rahul Jain, Ashutosh Nayyar

cs.LG updates on arXiv.org arxiv.org

arXiv:2109.03396v2 Announce Type: replace
Abstract: In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of $O(HS\sqrt{AT})$ in the infinite-horizon zero-sum stochastic games with average-reward criterion. Here $H$ is an upper bound on the span of the bias function, $S$ is the number of states, $A$ is the number of joint actions and $T$ is the horizon. We consider the online setting where the opponent can …

abstract algorithm arxiv bayesian criterion cs.gt cs.lg games horizon online learning paper posterior reinforcement reinforcement learning sampling stochastic type

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US