all AI news
Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation. (arXiv:2202.11742v1 [cs.CV])
cs.CV updates on arXiv.org arxiv.org
Following language instructions to navigate in unseen environments is a
challenging problem for autonomous embodied agents. The agent not only needs to
ground languages in visual scenes, but also should explore the environment to
reach its target. In this work, we propose a dual-scale graph transformer
(DUET) for joint long-term action planning and fine-grained cross-modal
understanding. We build a topological map on-the-fly to enable efficient
exploration in global action space. To balance the complexity of large action
space reasoning and …
arxiv cv global graph language navigation scale think transformer vision