all AI news
LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
June 18, 2024, 4:49 a.m. | Dantong Niu, Yuvan Sharma, Giscard Biamby, Jerome Quenum, Yutong Bai, Baifeng Shi, Trevor Darrell, Roei Herzig
cs.LG updates on arXiv.org arxiv.org
Abstract: In recent years, instruction-tuned Large Multimodal Models (LMMs) have been successful at several tasks, including image captioning and visual question answering; yet leveraging these models remains an open question for robotics. Prior LMMs for robotics applications have been extensively trained on language and action data, but their ability to generalize in different settings has often been less than desired. To address this, we introduce LLARVA, a model trained with a novel instruction tuning method that …
abstract action applications arxiv captioning cs.cv cs.lg cs.ro data image instruction-tuned instruction tuning language large multimodal models lmms multimodal multimodal models prior question question answering robot robotics tasks tuning type vision visual
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
AI Focused Biochemistry Postdoctoral Fellow
@ Lawrence Berkeley National Lab | Berkeley, CA
Senior Data Engineer
@ Displate | Warsaw
PhD Student AI simulation electric drive (f/m/d)
@ Volkswagen Group | Kassel, DE, 34123
AI Privacy Research Lead
@ Leidos | 6314 Remote/Teleworker US
Senior Platform System Architect, Silicon
@ Google | New Taipei, Banqiao District, New Taipei City, Taiwan
Fabrication Hardware Litho Engineer, Quantum AI
@ Google | Goleta, CA, USA