June 29, 2022, 1:10 a.m. | Ruiquan Huang, Jing Yang, Yingbin Liang

cs.LG updates on arXiv.org arxiv.org

While the primary goal of the exploration phase in reward-free reinforcement
learning (RF-RL) is to reduce the uncertainty in the estimated model with
minimum number of trajectories, in practice, the agent often needs to abide by
certain safety constraint at the same time. It remains unclear how such safe
exploration requirement would affect the corresponding sample complexity to
achieve the desired optimality of the obtained policy in planning. In this
work, we make a first attempt to answer this question. …

arxiv complexity exploration free lg rl

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US