Web: http://arxiv.org/abs/2103.16561

May 5, 2022, 1:10 a.m. | Wanrong Zhu, Yuankai Qi, Pradyumna Narayana, Kazoo Sone, Sugato Basu, Xin Eric Wang, Qi Wu, Miguel Eckstein, William Yang Wang

cs.CV updates on arXiv.org arxiv.org

Vision-and-language navigation (VLN) is a multimodal task where an agent
follows natural language instructions and navigates in visual environments.
Multiple setups have been proposed, and researchers apply new model
architectures or training techniques to boost navigation performance. However,
there still exist non-negligible gaps between machines' performance and human
benchmarks. Moreover, the agents' inner mechanisms for navigation decisions
remain unclear. To the best of our knowledge, how the agents perceive the
multimodal input is under-studied and needs investigation. In this work, …

arxiv cv language navigation vision

More from arxiv.org / cs.CV updates on arXiv.org

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL

Assistant/Associate Professor of the Practice in Business Analytics

@ Georgetown University McDonough School of Business | Washington DC

Senior Data Science Writer

@ NannyML | Remote

Director of AI/ML Engineering

@ Armis Industries | Remote (US only), St. Louis, California

Digital Analytics Manager

@ Patagonia | Ventura, California