Web: http://arxiv.org/abs/2206.08522

June 20, 2022, 1:13 a.m. | Kaizhi Zheng, Xiaotong Chen, Odest Chadwicke Jenkins, Xin Eric Wang

cs.CV updates on arXiv.org arxiv.org

Benefiting from language flexibility and compositionality, humans naturally
intend to use language to command an embodied agent for complex tasks such as
navigation and object manipulation. In this work, we aim to fill the blank of
the last mile of embodied agents -- object manipulation by following human
guidance, e.g., "move the red mug next to the box while keeping it upright." To
this end, we introduce an Automatic Manipulation Solver (AMSolver) simulator
and build a Vision-and-Language Manipulation benchmark (VLMbench) …

arxiv benchmark language vision

More from arxiv.org / cs.CV updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY