April 24, 2023, 12:44 a.m. | Philippe Hansen-Estruch, Ilya Kostrikov, Michael Janner, Jakub Grudzien Kuba, Sergey Levine

cs.LG updates on arXiv.org arxiv.org

Effective offline RL methods require properly handling out-of-distribution
actions. Implicit Q-learning (IQL) addresses this by training a Q-function
using only dataset actions through a modified Bellman backup. However, it is
unclear which policy actually attains the values represented by this implicitly
trained Q-function. In this paper, we reinterpret IQL as an actor-critic method
by generalizing the critic objective and connecting it to a
behavior-regularized implicit actor. This generalization shows how the induced
actor balances reward maximization and divergence from the …

actor-critic arxiv backup behavior dataset diffusion distribution divergence function loss offline paper policy q-learning shows through training values

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US