June 7, 2022, 1:10 a.m. | Kin-Ho Lam, Delyar Tabatabai, Jed Irvine, Donald Bertucci, Anita Ruangrotsakun, Minsuk Kahng, Alan Fern

cs.LG updates on arXiv.org arxiv.org

Reinforcement learning (RL) agents are commonly evaluated via their expected
value over a distribution of test scenarios. Unfortunately, this evaluation
approach provides limited evidence for post-deployment generalization beyond
the test distribution. In this paper, we address this limitation by extending
the recent CheckList testing methodology from natural language processing to
planning-based RL. Specifically, we consider testing RL agents that make
decisions via online tree search using a learned transition model and value
function. The key idea is to improve the …

ai arxiv planning rl testing value

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US