all AI news
[Research] MMStar: Are We on the Right Way for Evaluating Large Vision-Language Models?
April 12, 2024, 5:08 a.m. | /u/KennyMcKormick_
Machine Learning www.reddit.com
Evaluation Code: [https://github.com/open-compass/VLMEvalKit](https://github.com/open-compass/VLMEvalKit)
Abstract:
Large vision-language models (LVLMs) have recently achieved rapid progress, sparking numerous studies to evaluate their multi-modal capabilities. However, we dig into current evaluation works and identify two primary issues: 1) Visual content is unnecessary for many samples. The answers can be directly inferred from the questions and options, or the world knowledge embedded in LLMs. This phenomenon is prevalent across current benchmarks. For instance, GeminiPro achieves 42.9% on the MMMU benchmark without any visual …
abstract capabilities current embedded evaluation however identify knowledge language language models llms machinelearning modal multi-modal progress questions samples studies vision vision-language models visual world
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
C003549 Data Analyst (NS) - MON 13 May
@ EMW, Inc. | Braine-l'Alleud, Wallonia, Belgium
Marketing Decision Scientist
@ Meta | Menlo Park, CA | New York City