all AI news
Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning. (arXiv:2209.07676v1 [cs.LG])
Sept. 19, 2022, 1:11 a.m. | Shenao Zhang
cs.LG updates on arXiv.org arxiv.org
Provably efficient Model-Based Reinforcement Learning (MBRL) based on
optimism or posterior sampling (PSRL) is ensured to attain the global
optimality asymptotically by introducing the complexity measure of the model.
However, the complexity might grow exponentially for the simplest nonlinear
models, where global convergence is impossible within finite iterations. When
the model suffers a large generalization error, which is quantitatively
measured by the model complexity, the uncertainty can be large. The sampled
model that current policy is greedily optimized upon will …
arxiv optimization policy reinforcement reinforcement learning
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Senior ML Researcher - 3D Geometry Processing | 3D Shape Generation | 3D Mesh Data
@ Promaton | Europe
Data Scientist
@ Motive | India - Remote
Senior Perception Engineer
@ NVIDIA | US, CA, Santa Clara
Business Data Analyst, Finance and Treasury Data Repositories, Senior Associate
@ State Street | Krakow, Poland
Junior AI Engineer (Internship)
@ Sony | SEU - Italy - Roma
Manager, Data Science 3
@ PayPal | USA - Pennsylvania - Virtual