all AI news
Actional Atomic-Concept Learning for Demystifying Vision-Language Navigation
March 15, 2024, 4:46 a.m. | Bingqian Lin, Yi Zhu, Xiaodan Liang, Liang Lin, Jianzhuang Liu
cs.CV updates on arXiv.org arxiv.org
Abstract: Vision-Language Navigation (VLN) is a challenging task which requires an agent to align complex visual observations to language instructions to reach the goal position. Most existing VLN agents directly learn to align the raw directional features and visual features trained using one-hot labels to linguistic instruction features. However, the big semantic gap among these multi-modal inputs makes the alignment difficult and therefore limits the navigation performance. In this paper, we propose Actional Atomic-Concept Learning (AACL), …
abstract agent agents arxiv concept cs.ai cs.cv features hot however labels language learn navigation raw type vision visual
More from arxiv.org / cs.CV updates on arXiv.org
Compact 3D Scene Representation via Self-Organizing Gaussian Grids
2 days, 1 hour ago |
arxiv.org
Fingerprint Matching with Localized Deep Representation
2 days, 1 hour ago |
arxiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne