Feb. 27, 2024, 5:48 a.m. | Jiaqi Chen, Bingqian Lin, Ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong

cs.CV updates on arXiv.org arxiv.org

arXiv:2401.07314v2 Announce Type: replace-cross
Abstract: Embodied agents equipped with GPT as their brain have exhibited extraordinary decision-making and generalization abilities across various tasks. However, existing zero-shot agents for vision-and-language navigation (VLN) only prompt the GPT-4 to select potential locations within localized environments, without constructing an effective "global-view" for the agent to understand the overall environment. In this work, we present a novel map-guided GPT-based agent, dubbed MapGPT, which introduces an online linguistic-formed map to encourage the global exploration. Specifically, we …

abstract agent agents arxiv brain cs.ai cs.cv cs.ro decision embodied environments global gpt gpt-4 language locations making map navigation path planning prompt prompting tasks type view vision vision-and-language zero-shot

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Aumni - Site Reliability Engineer III - MLOPS

@ JPMorgan Chase & Co. | Salt Lake City, UT, United States

Senior Data Analyst

@ Teya | Budapest, Hungary

Technical Analyst (Data Analytics)

@ Contact Government Services | Chicago, IL

Engineer, AI/Machine Learning

@ Masimo | Irvine, CA, United States

Private Bank - Executive Director: Data Science and Client / Business Intelligence

@ JPMorgan Chase & Co. | Mumbai, Maharashtra, India