June 6, 2024, 4:43 a.m. | Yu Zhang, Rui Yu, Zhipeng Yao, Wenyuan Zhang, Jun Wang, Liming Zhang

cs.LG updates on arXiv.org arxiv.org

arXiv:2406.03324v1 Announce Type: new
Abstract: The Mean Square Error (MSE) is commonly utilized to estimate the solution of the optimal value function in the vast majority of offline reinforcement learning (RL) models and has achieved outstanding performance. However, we find that its principle can lead to overestimation phenomenon for the value function. In this paper, we first theoretically analyze overestimation phenomenon led by MSE and provide the theoretical upper bound of the overestimated error. Furthermore, to address it, we propose …

abstract arxiv cs.lg error function gap however loss mean offline performance reinforcement reinforcement learning solution square type value vast

Senior Data Engineer

@ Displate | Warsaw

Solution Architect

@ Philips | Bothell - B2 - Bothell 22050

Senior Product Development Engineer - Datacenter Products

@ NVIDIA | US, CA, Santa Clara

Systems Engineer - 2nd Shift (Onsite)

@ RTX | PW715: Asheville Site W Asheville Greenfield Site TBD , Asheville, NC, 28803 USA

System Test Engineers (HW & SW)

@ Novanta | Barcelona, Spain

Senior Solutions Architect, Energy

@ NVIDIA | US, TX, Remote