Dec. 28, 2023, 4:22 p.m. | /u/steeveHuang

Deep Learning www.reddit.com

We've just wrapped up a collaborative study with Columbia University and the University of Macau that probes into the capabilities of Large Vision-Language Models (LVLMs) when it comes to understanding and describing charts. The findings are quite startling.

Despite advancements in LVLMs, our research reveals that even the most advanced LVLMs like GPT-4V and Bard fall short. A striking 🚨**81.27%** (321/ 395) 🚨 of the captions they generated contained factual errors, misinterpreting data from charts. This suggests a significant gap …

capabilities charts collaborative columbia university deeplearning found language language models research study understanding university vision vision-language models

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Scientist

@ Publicis Groupe | New York City, United States

Bigdata Cloud Developer - Spark - Assistant Manager

@ State Street | Hyderabad, India