all AI news
IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies. (arXiv:2304.10573v1 [cs.LG])
cs.LG updates on arXiv.org arxiv.org
Effective offline RL methods require properly handling out-of-distribution
actions. Implicit Q-learning (IQL) addresses this by training a Q-function
using only dataset actions through a modified Bellman backup. However, it is
unclear which policy actually attains the values represented by this implicitly
trained Q-function. In this paper, we reinterpret IQL as an actor-critic method
by generalizing the critic objective and connecting it to a
behavior-regularized implicit actor. This generalization shows how the induced
actor balances reward maximization and divergence from the …
actor-critic arxiv backup behavior dataset diffusion distribution divergence function loss offline paper policy q-learning shows through training values