Feb. 9, 2024, 5:42 a.m. | Jost Tobias Springenberg Abbas Abdolmaleki Jingwei Zhang Oliver Groth Michael Bloesch Thomas Lampe Phi

cs.LG updates on arXiv.org arxiv.org

We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behavioral cloning baselines for multi-task training on a large dataset containing both sub-optimal and expert behavior on 132 continuous control tasks. We introduce a Perceiver-based actor-critic model and elucidate the key model features needed to make offline RL work with self- and cross-attention modules. Overall, …

actor actor-critic algorithms behavior cloning cs.ai cs.lg cs.ro dataset expert large models laws offline reinforcement reinforcement learning scale scaling show supervised learning training transformers

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US