all AI news
Temporal-Spatial Object Relations Modeling for Vision-and-Language Navigation
March 26, 2024, 4:46 a.m. | Bowen Huang, Yanwei Zheng, Chuanlin Lan, Xinpeng Zhao, Dongxiao yu, Yifei Zou
cs.CV updates on arXiv.org arxiv.org
Abstract: Vision-and-Language Navigation (VLN) is a challenging task where an agent is required to navigate to a natural language described location via vision observations. The navigation abilities of the agent can be enhanced by the relations between objects, which are usually learned using internal objects or external datasets. The relationships between internal objects are modeled employing graph convolutional network (GCN) in traditional studies. However, GCN tends to be shallow, limiting its modeling ability. To address this …
abstract agent arxiv cs.cv datasets language location modeling natural natural language navigation object objects relations spatial temporal type via vision vision-and-language
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US