all AI news
EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models
June 11, 2024, 4:42 a.m. | Mengfei Du, Binhao Wu, Zejun Li, Xuanjing Huang, Zhongyu Wei
cs.CL updates on arXiv.org arxiv.org
Abstract: The recent rapid development of Large Vision-Language Models (LVLMs) has indicated their potential for embodied tasks.However, the critical skill of spatial understanding in embodied environments has not been thoroughly evaluated, leaving the gap between current LVLMs and qualified embodied intelligence unknown. Therefore, we construct EmbSpatial-Bench, a benchmark for evaluating embodied spatial understanding of LVLMs.The benchmark is automatically derived from embodied scenes and covers 6 spatial relationships from an egocentric perspective.Experiments expose the insufficient capacity of …
abstract arxiv benchmarking construct cs.ai cs.cl cs.cv cs.mm current development embodied embodied intelligence environments gap however intelligence language language models potential skill spatial tasks type understanding vision vision-language vision-language models
More from arxiv.org / cs.CL updates on arXiv.org
ReFT: Reasoning with Reinforced Fine-Tuning
2 days, 10 hours ago |
arxiv.org
Exploring Defeasibility in Causal Reasoning
2 days, 10 hours ago |
arxiv.org
A Large Language Model Approach to Educational Survey Feedback Analysis
2 days, 10 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Scientist
@ Ford Motor Company | Chennai, Tamil Nadu, India
Systems Software Engineer, Graphics
@ Parallelz | Vancouver, British Columbia, Canada - Remote
Engineering Manager - Geo Engineering Team (F/H/X)
@ AVIV Group | Paris, France
Data Analyst
@ Microsoft | San Antonio, Texas, United States
Azure Data Engineer
@ TechVedika | Hyderabad, India
Senior Data & AI Threat Detection Researcher (Cortex)
@ Palo Alto Networks | Tel Aviv-Yafo, Israel