all AI news
Bird's-Eye-View Scene Graph for Vision-Language Navigation. (arXiv:2308.04758v1 [cs.CV])
cs.CV updates on arXiv.org arxiv.org
Vision-language navigation (VLN), which entails an agent to navigate 3D
environments following human instructions, has shown great advances. However,
current agents are built upon panoramic observations, which hinders their
ability to perceive 3D scene geometry and easily leads to ambiguous selection
of panoramic view. To address these limitations, we present a BEV Scene Graph
(BSG), which leverages multi-step BEV representations to encode scene layouts
and geometric cues of indoor environment under the supervision of 3D detection.
During navigation, BSG builds …
agents arxiv bird current environments geometry graph human language leads limitations navigation scene geometry vision