June 3, 2022, 1:10 a.m. | Wonjoon Goo, Scott Niekum

cs.LG updates on arXiv.org arxiv.org

We introduce an offline reinforcement learning (RL) algorithm that explicitly
clones a behavior policy to constrain value learning. In offline RL, it is
often important to prevent a policy from selecting unobserved actions, since
the consequence of these actions cannot be presumed without additional
information about the environment. One straightforward way to implement such a
constraint is to explicitly model a given data distribution via behavior
cloning and directly force a policy not to select uncertain actions. However,
many offline …

arxiv rl

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US