Feb. 28, 2024, 5:43 a.m. | Xiaoyu Zhang, Matthew Chang, Pranav Kumar, Saurabh Gupta

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.17768v1 Announce Type: cross
Abstract: A common failure mode for policies trained with imitation is compounding execution errors at test time. When the learned policy encounters states that were not present in the expert demonstrations, the policy fails, leading to degenerate behavior. The Dataset Aggregation, or DAgger approach to this problem simply collects more data to cover these failure states. However, in practice, this is often prohibitively expensive. In this work, we propose Diffusion Meets DAgger (DMD), a method to …

abstract aggregation arxiv behavior cs.ai cs.cv cs.lg cs.ro dataset diffusion errors expert failure imitation learning policy test type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Director, Clinical Data Science

@ Aura | Remote USA

Research Scientist, AI (PhD)

@ Meta | Menlo Park, CA | New York City