Feb. 8, 2024, 5:47 a.m. | Taiki Miyanishi Daichi Azuma Shuhei Kurita Motoki Kawanabe

cs.CV updates on arXiv.org arxiv.org

We present a novel task for cross-dataset visual grounding in 3D scenes (Cross3DVG), which overcomes limitations of existing 3D visual grounding models, specifically their restricted 3D resources and consequent tendencies of overfitting a specific 3D dataset. We created RIORefer, a large-scale 3D visual grounding dataset, to facilitate Cross3DVG. It includes more than 63k diverse descriptions of 3D objects within 1,380 indoor RGB-D scans from 3RScan, with human annotations. After training the Cross3DVG model using the source 3D visual grounding dataset, …

3d scenes cs.cv dataset limitations novel overfitting resources rgb-d scale scans visual

