[Research] MMStar: Are We on the Right Way for Evaluating Large Vision-Language Models? | allainews.com

April 12, 2024, 5:08 a.m. | /u/KennyMcKormick_

Machine Learning www.reddit.com

Paper: [https://arxiv.org/abs/2403.20330](https://arxiv.org/abs/2403.20330)

Evaluation Code: [https://github.com/open-compass/VLMEvalKit](https://github.com/open-compass/VLMEvalKit)

Abstract:

Large vision-language models (LVLMs) have recently achieved rapid progress, sparking numerous studies to evaluate their multi-modal capabilities. However, we dig into current evaluation works and identify two primary issues: 1) Visual content is unnecessary for many samples. The answers can be directly inferred from the questions and options, or the world knowledge embedded in LLMs. This phenomenon is prevalent across current benchmarks. For instance, GeminiPro achieves 42.9% on the MMMU benchmark without any visual …

abstract capabilities current embedded evaluation however identify knowledge language language models llms machinelearning modal multi-modal progress questions samples studies vision vision-language models visual world

More from www.reddit.com / Machine Learning

[R] Training-free Graph Neural Networks and the Power of Labels as Features 8 hours ago | www.reddit.com

features free graph graph neural networks +6

[D] Modern best coding practices for Pytorch (for research)? 11 hours ago | www.reddit.com

coding config example good +14

[R] Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic … 14 hours ago | www.reddit.com

breaking data machinelearning model collapse +3

[P] I reproduced Anthropic's recent interpretability research 15 hours ago | www.reddit.com

anthropic attention basic capabilities +8

[R] KAN: Kolmogorov-Arnold Networks 16 hours ago | www.reddit.com

abstract every function functions +11

[D] Looking for a recent study/paper/article that showed that an alternate model with a similar … 16 hours ago | www.reddit.com

article conversation machinelearning nothing +4

[2404.10667] VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time 17 hours ago | www.reddit.com

audio generated machinelearning vasa +1

[D] Is RPE still a valid approach, or is RoPE entirely superior? 20 hours ago | www.reddit.com

attention datasets embed information +8

[D] TensorDock — GPU Cloud Marketplace, H100s from $2.49/hr 22 hours ago | www.reddit.com

building cloud cloud gpu gpu +17

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

C003549 Data Analyst (NS) - MON 13 May

@ EMW, Inc. | Braine-l'Alleud, Wallonia, Belgium

View on ai-jobs.net

Marketing Decision Scientist

@ Meta | Menlo Park, CA | New York City

View on ai-jobs.net