all AI news
MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention
June 25, 2024, 4:49 a.m. | Yuxin Chen, Chen Tang, Chenran Li, Ran Tian, Peter Stone, Masayoshi Tomizuka, Wei Zhan
cs.LG updates on arXiv.org arxiv.org
Abstract: Aligning robot behavior with human preferences is crucial for deploying embodied AI agents in human-centered environments. A promising solution is interactive imitation learning from human intervention, where a human expert observes the policy's execution and provides interventions as feedback. However, existing methods often fail to utilize the prior policy efficiently to facilitate learning, thus hindering sample efficiency. In this work, we introduce MEReQ (Maximum-Entropy Residual-Q Inverse Reinforcement Learning), designed for sample-efficient alignment from human intervention. …
abstract agents ai agents alignment arxiv behavior cs.ai cs.lg cs.ro deploying embodied embodied ai environments expert fail feedback however human human intervention imitation learning interactive max policy residual robot sample solution type
More from arxiv.org / cs.LG updates on arXiv.org
MixerFlow: MLP-Mixer meets Normalising Flows
2 days, 5 hours ago |
arxiv.org
Kernelised Normalising Flows
2 days, 5 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Software Engineer II –Decision Intelligence Delivery and Support
@ Bristol Myers Squibb | Hyderabad
Senior Data Governance Consultant (Remote in US)
@ Resultant | Indianapolis, IN, United States
Power BI Developer
@ Brompton Bicycle | Greenford, England, United Kingdom
VP, Enterprise Applications
@ Blue Yonder | Scottsdale
Data Scientist - Moloco Commerce Media
@ Moloco | Redwood City, California, United States
Senior Backend Engineer (New York)
@ Kalepa | New York City. Hybrid