Web: http://arxiv.org/abs/2209.07676

Sept. 19, 2022, 1:11 a.m. | Shenao Zhang

cs.LG updates on arXiv.org arxiv.org

Provably efficient Model-Based Reinforcement Learning (MBRL) based on
optimism or posterior sampling (PSRL) is ensured to attain the global
optimality asymptotically by introducing the complexity measure of the model.
However, the complexity might grow exponentially for the simplest nonlinear
models, where global convergence is impossible within finite iterations. When
the model suffers a large generalization error, which is quantitatively
measured by the model complexity, the uncertainty can be large. The sampled
model that current policy is greedily optimized upon will …

arxiv optimization policy reinforcement reinforcement learning

More from arxiv.org / cs.LG updates on arXiv.org

Machine Learning Product Manager (Canada, Remote)

@ FreshBooks | Canada

Data Engineer

@ Amazon.com | Irvine, California, USA

Senior Autonomy Behavior II, Performance Assessment Engineer

@ Cruise LLC | San Francisco, CA

Senior Data Analytics Engineer

@ Intercom | Dublin, Ireland

Data Analyst Intern

@ ADDX | Singapore

Data Science Analyst - Consumer

@ Yelp | London, England, United Kingdom

Senior Data Analyst - Python+Hadoop

@ Capco | India - Bengaluru

DevOps Engineer, Data Team

@ SingleStore | Hyderabad, India

Software Engineer (Machine Learning, AI Platform)

@ Phaidra | Remote

Sr. UI/UX Designer - Artificial Intelligence (ID:1213)

@ Truelogic Software | Remote, anywhere in LATAM

Analytics Engineer

@ carwow | London, England, United Kingdom

HRIS Data Analyst

@ SecurityScorecard | Remote