June 1, 2022, 1:11 a.m. | Zhengyao Jiang, Tianjun Zhang, Robert Kirk, Tim Rocktäschel, Edward Grefenstette

cs.LG updates on arXiv.org arxiv.org

The successes of deep Reinforcement Learning (RL) are limited to settings
where we have a large stream of online experiences, but applying RL in the
data-efficient setting with limited access to online interactions is still
challenging. A key to data-efficient RL is good value estimation, but current
methods in this space fail to fully utilise the structure of the trajectory
data gathered from the environment. In this paper, we treat the transition data
of the MDP as a graph, and …

arxiv backup data graph transitions

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne