June 29, 2022, 1:11 a.m. | Ruiquan Huang, Jing Yang, Yingbin Liang

stat.ML updates on arXiv.org arxiv.org

While the primary goal of the exploration phase in reward-free reinforcement
learning (RF-RL) is to reduce the uncertainty in the estimated model with
minimum number of trajectories, in practice, the agent often needs to abide by
certain safety constraint at the same time. It remains unclear how such safe
exploration requirement would affect the corresponding sample complexity to
achieve the desired optimality of the obtained policy in planning. In this
work, we make a first attempt to answer this question. …

arxiv complexity exploration free lg rl

More from arxiv.org / stat.ML updates on arXiv.org

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

Principal Data Architect - Azure & Big Data

@ MGM Resorts International | Home Office - US, NV

GN SONG MT Market Research Data Analyst 11

@ Accenture | Bengaluru, BDC7A