all AI news
Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching. (arXiv:2311.01331v1 [cs.LG])
cs.LG updates on arXiv.org arxiv.org
In real-world scenarios, arbitrary interactions with the environment can
often be costly, and actions of expert demonstrations are not always available.
To reduce the need for both, Offline Learning from Observations (LfO) is
extensively studied, where the agent learns to solve a task with only expert
states and \textit{task-agnostic} non-expert state-action pairs. The
state-of-the-art DIstribution Correction Estimation (DICE) methods minimize the
state occupancy divergence between the learner and expert policies. However,
they are limited to either $f$-divergences (KL and $\chi^2$) …
agent arxiv environment expert interactions observation offline primal reduce solve state world