May 20, 2022, 1:11 a.m. | Wei-Di Chang, Juan Camilo Gamboa Higuera, Scott Fujimoto, David Meger, Gregory Dudek

cs.LG updates on arXiv.org arxiv.org

We present an algorithm for Inverse Reinforcement Learning (IRL) from expert
state observations only. Our approach decouples reward modelling from policy
learning, unlike state-of-the-art adversarial methods which require updating
the reward model during policy search and are known to be unstable and
difficult to optimize. Our method, IL-flOw, recovers the expert policy by
modelling state-state transitions, by generating rewards using deep density
estimators trained on the demonstration trajectories, avoiding the instability
issues of adversarial methods. We demonstrate that using the …

arxiv flow imitation learning learning observation

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

AI Scientist/Engineer

@ OKX | Singapore

Research Engineering/ Scientist Associate I

@ The University of Texas at Austin | AUSTIN, TX

Senior Data Engineer

@ Algolia | London, England

Fundamental Equities - Vice President, Equity Quant Research Analyst (Income & Value Investment Team)

@ BlackRock | NY7 - 50 Hudson Yards, New York

Snowflake Data Analytics

@ Devoteam | Madrid, Spain