Efficient Offline Reinforcement Learning: The Critic is Critical | allainews.com

June 21, 2024, 4:46 a.m. | Adam Jelley, Trevor McInroe, Sam Devlin, Amos Storkey

cs.LG updates on arXiv.org arxiv.org

arXiv:2406.13376v1 Announce Type: new
Abstract: Recent work has demonstrated both benefits and limitations from using supervised approaches (without temporal-difference learning) for offline reinforcement learning. While off-policy reinforcement learning provides a promising approach for improving performance beyond supervised approaches, we observe that training is often inefficient and unstable due to temporal difference bootstrapping. In this paper we propose a best-of-both approach by first learning the behavior policy and critic with supervised learning, before improving with off-policy reinforcement learning. Specifically, we demonstrate …

arxiv cs.lg offline reinforcement reinforcement learning type

More from arxiv.org / cs.LG updates on arXiv.org

Scientific Machine Learning Based Reduced-Order Models for Plasma Turbulence Simulations 10 hours ago | arxiv.org

abstract arxiv build construction +20

LEDITS++: Limitless Image Editing using Text-to-Image Models 10 hours ago | arxiv.org

abstract aim apply arxiv +22

InterVLS: Interactive Model Understanding and Improvement with Vision-Language Surrogates 10 hours ago | arxiv.org

abstract applications arxiv challenges +22

Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor 10 hours ago | arxiv.org

abstract arxiv challenges cs.ai +16

Empathy Detection from Text, Audiovisual, Audio or Physiological Signals: Task Formulations and Machine Learning Methods 10 hours ago | arxiv.org

abstract applications arxiv attention +19

Autoencoder-based Anomaly Detection System for Online Data Quality Monitoring of the CMS Electromagnetic Calorimeter 10 hours ago | arxiv.org

abstract anomaly anomaly detection arxiv +20

Gradient Coding with Iterative Block Leverage Score Sampling 10 hours ago | arxiv.org

abstract arxiv block coding +17

Contextual Dynamic Pricing with Strategic Buyers 10 hours ago | arxiv.org

abstract arxiv behavior consumer +18

On Convex Data-Driven Inverse Optimal Control for Nonlinear, Non-stationary and Stochastic Systems 10 hours ago | arxiv.org

abstract agent arxiv context +19

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

View on ai-jobs.net

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

PhD Student AI simulation electric drive (f/m/d)

@ Volkswagen Group | Kassel, DE, 34123

View on ai-jobs.net

AI Privacy Research Lead

@ Leidos | 6314 Remote/Teleworker US

View on ai-jobs.net

Senior Platform System Architect, Silicon

@ Google | New Taipei, Banqiao District, New Taipei City, Taiwan

View on ai-jobs.net

Fabrication Hardware Litho Engineer, Quantum AI

@ Google | Goleta, CA, USA

View on ai-jobs.net