March 13, 2024, 4:47 a.m. | Bingqian Lin, Yunshuang Nie, Ziming Wei, Jiaqi Chen, Shikui Ma, Jianhua Han, Hang Xu, Xiaojun Chang, Xiaodan Liang

cs.CL updates on

arXiv:2403.07376v1 Announce Type: cross
Abstract: Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions. Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability. However, their predominant use in an offline manner usually suffers from substantial domain gap between the VLN task and the LLM training corpus. This paper introduces a novel strategy …

arxiv boosting language llm navigation reasoning type via vision vision-and-language

Senior Data Engineer

@ Displate | Warsaw

Solution Architect

@ Philips | Bothell - B2 - Bothell 22050

Senior Product Development Engineer - Datacenter Products

@ NVIDIA | US, CA, Santa Clara

Systems Engineer - 2nd Shift (Onsite)

@ RTX | PW715: Asheville Site W Asheville Greenfield Site TBD , Asheville, NC, 28803 USA

System Test Engineers (HW & SW)

@ Novanta | Barcelona, Spain

Senior Solutions Architect, Energy

@ NVIDIA | US, TX, Remote