all AI news
Policy Optimization with Smooth Guidance Learned from State-Only Demonstrations
April 11, 2024, 4:43 a.m. | Guojian Wang, Faguo Wu, Xiao Zhang, Tianyuan Chen, Zhiming Zheng
cs.LG updates on arXiv.org arxiv.org
Abstract: The sparsity of reward feedback remains a challenging problem in online deep reinforcement learning (DRL). Previous approaches have utilized offline demonstrations to achieve impressive results in multiple hard tasks. However, these approaches place high demands on demonstration quality, and obtaining expert-like actions is often costly and unrealistic. To tackle these problems, we propose a simple and efficient algorithm called Policy Optimization with Smooth Guidance (POSG), which leverages a small set of state-only demonstrations (where only …
abstract arxiv cs.lg expert feedback guidance however multiple offline optimization policy quality reinforcement reinforcement learning results sparsity state tasks type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Software Engineer, Data Tools - Full Stack
@ DoorDash | Pune, India
Senior Data Analyst
@ Artsy | New York City