all AI news
Offline RL Made Easier: No TD Learning, Advantage Reweighting, or Transformers
April 20, 2022, 9 a.m. |
The Berkeley Artificial Intelligence Research Blog bair.berkeley.edu
A demonstration of the RvS policy we learn with just supervised learning and a depth-two MLP. It uses no TD learning, advantage reweighting, or Transformers!
Offline reinforcement learning (RL) is conventionally approached using value-based methods based on temporal difference (TD) learning. However, many recent algorithms reframe RL as a supervised learning problem. These algorithms learn conditional policies by conditioning on goal states (Lynch et al., 2019; Ghosh et al., 2021), reward-to-go (Kumar et al., 2019; Chen et …
!-->More from bair.berkeley.edu / The Berkeley Artificial Intelligence Research Blog
2024 BAIR Graduate Directory
1 month, 2 weeks ago |
bair.berkeley.edu
2024 BAIR Graduate Directory
1 month, 2 weeks ago |
bair.berkeley.edu
The Shift from Models to Compound AI Systems
2 months, 1 week ago |
bair.berkeley.edu
The Shift from Models to Compound AI Systems
2 months, 1 week ago |
bair.berkeley.edu
Ghostbuster: Detecting Text Ghostwritten by Large Language Models
5 months, 1 week ago |
bair.berkeley.edu
Ghostbuster: Detecting Text Ghostwritten by Large Language Models
5 months, 1 week ago |
bair.berkeley.edu
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Alternant Data Engineering
@ Aspire Software | Angers, FR
Senior Software Engineer, Generative AI
@ Google | Dublin, Ireland