March 15, 2024, 4:46 a.m. | Bingqian Lin, Yi Zhu, Xiaodan Liang, Liang Lin, Jianzhuang Liu

cs.CV updates on arXiv.org arxiv.org

arXiv:2302.06072v2 Announce Type: replace
Abstract: Vision-Language Navigation (VLN) is a challenging task which requires an agent to align complex visual observations to language instructions to reach the goal position. Most existing VLN agents directly learn to align the raw directional features and visual features trained using one-hot labels to linguistic instruction features. However, the big semantic gap among these multi-modal inputs makes the alignment difficult and therefore limits the navigation performance. In this paper, we propose Actional Atomic-Concept Learning (AACL), …

abstract agent agents arxiv concept cs.ai cs.cv features hot however labels language learn navigation raw type vision visual

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne