Aug. 24, 2022, 1:11 a.m. | Mengxin Yu, Zhuoran Yang, Jianqing Fan

cs.LG updates on arXiv.org arxiv.org

We study offline reinforcement learning under a novel model called strategic
MDP, which characterizes the strategic interactions between a principal and a
sequence of myopic agents with private types. Due to the bilevel structure and
private types, strategic MDP involves information asymmetry between the
principal and the agents. We focus on the offline RL problem, where the goal is
to learn the optimal policy of the principal concerning a target population of
agents based on a pre-collected dataset that consists …

arxiv decision information making ml rl

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US