MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention | allainews.com

June 25, 2024, 4:49 a.m. | Yuxin Chen, Chen Tang, Chenran Li, Ran Tian, Peter Stone, Masayoshi Tomizuka, Wei Zhan

cs.LG updates on arXiv.org arxiv.org

arXiv:2406.16258v1 Announce Type: cross
Abstract: Aligning robot behavior with human preferences is crucial for deploying embodied AI agents in human-centered environments. A promising solution is interactive imitation learning from human intervention, where a human expert observes the policy's execution and provides interventions as feedback. However, existing methods often fail to utilize the prior policy efficiently to facilitate learning, thus hindering sample efficiency. In this work, we introduce MEReQ (Maximum-Entropy Residual-Q Inverse Reinforcement Learning), designed for sample-efficient alignment from human intervention. …

abstract agents ai agents alignment arxiv behavior cs.ai cs.lg cs.ro deploying embodied embodied ai environments expert fail feedback however human human intervention imitation learning interactive max policy residual robot sample solution type

More from arxiv.org / cs.LG updates on arXiv.org

Bayesian identification of nonseparable Hamiltonians with multiplicative noise using deep learning and reduced-order modeling 2 days, 5 hours ago | arxiv.org

abstract arxiv bayesian cs.lg +17

MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning 2 days, 5 hours ago | arxiv.org

abstract analysis arxiv cs.cv +16

Self-Supervised Detection of Perfect and Partial Input-Dependent Symmetries 2 days, 5 hours ago | arxiv.org

arxiv cs.cv cs.lg detection +3

MixerFlow: MLP-Mixer meets Normalising Flows 2 days, 5 hours ago | arxiv.org

abstract architectures arxiv context +15

Machine Learning-Enabled Software and System Architecture Frameworks 2 days, 5 hours ago | arxiv.org

abstract architecture arxiv concerns +22

Efficient Interaction-Aware Interval Analysis of Neural Network Feedback Loops 2 days, 5 hours ago | arxiv.org

abstract analysis arxiv cs.lg +19

Kernelised Normalising Flows 2 days, 5 hours ago | arxiv.org

abstract architecture arxiv capabilities +14

GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism 2 days, 5 hours ago | arxiv.org

abstract arxiv class cs.dc +25

Reinforcement Learning in Credit Scoring and Underwriting 2 days, 5 hours ago | arxiv.org

abstract action adapt arxiv +17

Software Engineer II –Decision Intelligence Delivery and Support

@ Bristol Myers Squibb | Hyderabad

View on ai-jobs.net

Senior Data Governance Consultant (Remote in US)

@ Resultant | Indianapolis, IN, United States

View on ai-jobs.net

Power BI Developer

@ Brompton Bicycle | Greenford, England, United Kingdom

View on ai-jobs.net

VP, Enterprise Applications

@ Blue Yonder | Scottsdale

View on ai-jobs.net

Data Scientist - Moloco Commerce Media

@ Moloco | Redwood City, California, United States

View on ai-jobs.net

Senior Backend Engineer (New York)

@ Kalepa | New York City. Hybrid

View on ai-jobs.net