Web: http://arxiv.org/abs/2201.10788

Jan. 27, 2022, 2:10 a.m. | Sinan Tan, Mengmeng Ge, Di Guo, Huaping Liu, Fuchun Sun

cs.CV updates on arXiv.org arxiv.org

In the Vision-and-Language Navigation task, the embodied agent follows
linguistic instructions and navigates to a specific goal. It is important in
many practical scenarios and has attracted extensive attention from both
computer vision and robotics communities. However, most existing works only use
RGB images but neglect the 3D semantic information of the scene. To this end,
we develop a novel self-supervised training framework to encode the voxel-level
3D semantic reconstruction into a 3D semantic representation. Specifically, a
region query task …

3d arxiv cv language learning navigation semantic vision

