Feb. 9, 2024, 5:42 a.m. | Jost Tobias Springenberg Abbas Abdolmaleki Jingwei Zhang Oliver Groth Michael Bloesch Thomas Lampe Phi

cs.LG updates on arXiv.org arxiv.org

We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behavioral cloning baselines for multi-task training on a large dataset containing both sub-optimal and expert behavior on 132 continuous control tasks. We introduce a Perceiver-based actor-critic model and elucidate the key model features needed to make offline RL work with self- and cross-attention modules. Overall, …

actor actor-critic algorithms behavior cloning cs.ai cs.lg cs.ro dataset expert large models laws offline reinforcement reinforcement learning scale scaling show supervised learning training transformers

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

Director of Machine Learning

@ Axelera AI | Hybrid/Remote - Europe (incl. UK)

Senior Data Scientist - Trendyol Milla

@ Trendyol | Istanbul (All)

Data Scientist, Mid

@ Booz Allen Hamilton | USA, CA, San Diego (1615 Murray Canyon Rd)

Systems Development Engineer , Amazon Robotics Business Applications and Solutions Engineering

@ Amazon.com | Boston, Massachusetts, USA