Multimodal Embodied Interactive Agent for Cafe Scene | allainews.com

Feb. 2, 2024, 3:42 p.m. | Yang Liu Xinshuai Song Kaixuan Jiang Weixing Chen Jingzhou Luo Guanbin Li Liang Lin

cs.CV updates on arXiv.org arxiv.org

With the surge in the development of large language models, embodied intelligence has attracted increasing attention. Nevertheless, prior works on embodied intelligence typically encode scene or historical memory in an unimodal manner, either visual or linguistic, which complicates the alignment of the model's action planning with embodied control. To overcome this limitation, we introduce the Multimodal Embodied Interactive Agent (MEIA), capable of translating high-level tasks expressed in natural language into a sequence of executable actions. Specifically, we propose a novel …

agent alignment attention control cs.cv development embodied embodied intelligence encode intelligence interactive language language models large language large language models memory multimodal planning prior visual

More from arxiv.org / cs.CV updates on arXiv.org

Improving Hateful Meme Detection through Retrieval-Guided Contrastive Learning 23 hours ago | arxiv.org

abstract arxiv clip cs.cl +19

Efficient HDR Reconstruction from Real-World Raw Images 23 hours ago | arxiv.org

abstract algorithms application arxiv +20

How to Train Neural Field Representations: A Comprehensive Study and Benchmark 23 hours ago | arxiv.org

abstract arxiv benchmark cs.cv +9

PixelLM: Pixel Reasoning with Large Multimodal Model 23 hours ago | arxiv.org

abstract arxiv bridge challenge +23

Mixture of Gaussian-distributed Prototypes with Generative Modelling for Interpretable and Trustworthy Image Recognition 23 hours ago | arxiv.org

abstract arxiv cs.cv decision +16

Model Synthesis for Zero-Shot Model Attribution 23 hours ago | arxiv.org

abstract art arxiv attribution +25

Revisiting the Trade-off between Accuracy and Robustness via Weight Distribution of Filters 23 hours ago | arxiv.org

abstract accuracy adversarial adversarial attacks +18

Key-Locked Rank One Editing for Text-to-Image Personalization 23 hours ago | arxiv.org

abstract arxiv challenges concepts +26

SSR-2D: Semantic 3D Scene Reconstruction from 2D Images 23 hours ago | arxiv.org

abstract annotations arxiv cs.cv +14

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

View on ai-jobs.net

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

View on ai-jobs.net

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

View on ai-jobs.net

Director, Global Success Business Intelligence

@ Salesforce | Texas - Austin

View on ai-jobs.net

Deep Learning Compiler Engineer - MLIR

@ NVIDIA | US, CA, Santa Clara

View on ai-jobs.net

Commerce Data Engineer (Remote)

@ CrowdStrike | USA TX Remote

View on ai-jobs.net