Feb. 2, 2024, 3:42 p.m. | Yang Liu Xinshuai Song Kaixuan Jiang Weixing Chen Jingzhou Luo Guanbin Li Liang Lin

cs.CV updates on arXiv.org arxiv.org

With the surge in the development of large language models, embodied intelligence has attracted increasing attention. Nevertheless, prior works on embodied intelligence typically encode scene or historical memory in an unimodal manner, either visual or linguistic, which complicates the alignment of the model's action planning with embodied control. To overcome this limitation, we introduce the Multimodal Embodied Interactive Agent (MEIA), capable of translating high-level tasks expressed in natural language into a sequence of executable actions. Specifically, we propose a novel …

agent alignment attention control cs.cv development embodied embodied intelligence encode intelligence interactive language language models large language large language models memory multimodal planning prior visual

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Director, Global Success Business Intelligence

@ Salesforce | Texas - Austin

Deep Learning Compiler Engineer - MLIR

@ NVIDIA | US, CA, Santa Clara

Commerce Data Engineer (Remote)

@ CrowdStrike | USA TX Remote