June 25, 2024, 4:49 a.m. | Yuxin Chen, Chen Tang, Chenran Li, Ran Tian, Peter Stone, Masayoshi Tomizuka, Wei Zhan

cs.LG updates on arXiv.org arxiv.org

arXiv:2406.16258v1 Announce Type: cross
Abstract: Aligning robot behavior with human preferences is crucial for deploying embodied AI agents in human-centered environments. A promising solution is interactive imitation learning from human intervention, where a human expert observes the policy's execution and provides interventions as feedback. However, existing methods often fail to utilize the prior policy efficiently to facilitate learning, thus hindering sample efficiency. In this work, we introduce MEReQ (Maximum-Entropy Residual-Q Inverse Reinforcement Learning), designed for sample-efficient alignment from human intervention. …

abstract agents ai agents alignment arxiv behavior cs.ai cs.lg cs.ro deploying embodied embodied ai environments expert fail feedback however human human intervention imitation learning interactive max policy residual robot sample solution type

Software Engineer II –Decision Intelligence Delivery and Support

@ Bristol Myers Squibb | Hyderabad

Senior Data Governance Consultant (Remote in US)

@ Resultant | Indianapolis, IN, United States

Power BI Developer

@ Brompton Bicycle | Greenford, England, United Kingdom

VP, Enterprise Applications

@ Blue Yonder | Scottsdale

Data Scientist - Moloco Commerce Media

@ Moloco | Redwood City, California, United States

Senior Backend Engineer (New York)

@ Kalepa | New York City. Hybrid