Multi: Multimodal Understanding Leaderboard with Text and Images | allainews.com

Feb. 6, 2024, 5:52 a.m. | Zichen Zhu Yang Xu Lu Chen Jingkai Yang Yichuan Ma Yiming Sun Hailin Wen Jiaqi Liu Jin

cs.CV updates on arXiv.org arxiv.org

Rapid progress in multimodal large language models (MLLMs) highlights the need to introduce challenging yet realistic benchmarks to the academic community. Existing benchmarks primarily focus on simple natural image understanding, but Multi emerges as a cutting-edge benchmark for MLLMs, offering a comprehensive dataset for evaluating MLLMs against understanding complex figures and tables, and scientific questions. This benchmark, reflecting current realistic examination styles, provides multimodal inputs and requires responses that are either precise or open-ended, similar to real-life school tests. It …

academic benchmark benchmarks community cs.ai cs.cl cs.cv dataset edge focus highlights image images language language models large language large language models leaderboard mllms multimodal natural progress simple tables text understanding

More from arxiv.org / cs.CV updates on arXiv.org

CheXmask: a large-scale dataset of anatomical segmentation masks for multi-center chest x-ray images 14 hours ago | arxiv.org

arxiv center cs.cv dataset +10

Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering 14 hours ago | arxiv.org

abstract agent arxiv augment +16

SONIC: Sonar Image Correspondence using Pose Supervised Learning for Imaging Sonars 14 hours ago | arxiv.org

abstract arxiv association cs.cv +18

On Partial Shape Correspondence and Functional Maps 14 hours ago | arxiv.org

abstract apply arxiv cs.cv +10

Hierarchical Side-Tuning for Vision Transformers 14 hours ago | arxiv.org

abstract arxiv challenge computational +18

DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models 14 hours ago | arxiv.org

animation arxiv cs.cv cs.gr +7

Local Padding in Patch-Based GANs for Seamless Infinite-Sized Texture Synthesis 14 hours ago | arxiv.org

arxiv cs.cv eess.iv gans +5

Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction Recognition 14 hours ago | arxiv.org

abstract action recognition applications arxiv +21

Intriguing Property and Counterfactual Explanation of GAN for Remote Sensing Image Generation 14 hours ago | arxiv.org

arxiv counterfactual cs.cv eess.iv +7

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net