April 11, 2024, 4:43 a.m. | Guojian Wang, Faguo Wu, Xiao Zhang, Tianyuan Chen, Zhiming Zheng

cs.LG updates on arXiv.org arxiv.org

arXiv:2401.00162v2 Announce Type: replace
Abstract: The sparsity of reward feedback remains a challenging problem in online deep reinforcement learning (DRL). Previous approaches have utilized offline demonstrations to achieve impressive results in multiple hard tasks. However, these approaches place high demands on demonstration quality, and obtaining expert-like actions is often costly and unrealistic. To tackle these problems, we propose a simple and efficient algorithm called Policy Optimization with Smooth Guidance (POSG), which leverages a small set of state-only demonstrations (where only …

abstract arxiv cs.lg expert feedback guidance however multiple offline optimization policy quality reinforcement reinforcement learning results sparsity state tasks type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Software Engineer, Data Tools - Full Stack

@ DoorDash | Pune, India

Senior Data Analyst

@ Artsy | New York City