Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models | allainews.com

March 29, 2024, 4:45 a.m. | Jiaxing Chen, Yuxuan Liu, Dehu Li, Xiang An, Ziyong Feng, Yongle Zhao, Yin Xie

cs.CV updates on arXiv.org arxiv.org

arXiv:2403.19322v1 Announce Type: new
Abstract: The surge of Multimodal Large Language Models (MLLMs), given their prominent emergent capabilities in instruction following and reasoning, has greatly advanced the field of visual reasoning. However, constrained by their non-lossless image tokenization, most MLLMs fall short of comprehensively capturing details of text and objects, especially in high-resolution images. To address this, we propose P2G, a novel framework for plug-and-play grounding of reasoning in MLLMs. Specifically, P2G exploits the tool-usage potential of MLLMs to employ …

abstract advanced arxiv capabilities cs.cl cs.cv however image language language models large language large language models mllms multimodal objects reasoning text tokenization type visual

More from arxiv.org / cs.CV updates on arXiv.org

Visual Environment Assessment for Safe Autonomous Quadrotor Landing 10 hours ago | arxiv.org

abstract aerial arxiv assessment +21

JPEG Quantized Coefficient Recovery via DCT Domain Spatial-Frequential Transformer 10 hours ago | arxiv.org

abstract arxiv compression cosine +14

Towards Unconstrained Audio Splicing Detection and Localization with Neural Networks 10 hours ago | arxiv.org

abstract arxiv audio audio editing +20

Explainable Light-Weight Deep Learning Pipeline for Improved Drought Stress Identification 10 hours ago | arxiv.org

abstract arxiv crops cs.cv +16

ValUES: A Framework for Systematic Validation of Uncertainty Estimation in Semantic Segmentation 10 hours ago | arxiv.org

arxiv cs.cv framework segmentation +5

A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis 10 hours ago | arxiv.org

analysis and analysis arxiv classification +7

Early Autism Diagnosis based on Path Signature and Siamese Unsupervised Feature Compressor 10 hours ago | arxiv.org

abstract arxiv autism children +17

Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning 10 hours ago | arxiv.org

abstract arxiv autonomy commonsense +15

Towards Diverse Binary Segmentation via A Simple yet General Gated Network 10 hours ago | arxiv.org

abstract arxiv basic binary +19

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Sr. Software Development Manager, AWS Neuron Machine Learning Distributed Training

@ Amazon.com | Cupertino, California, USA

View on ai-jobs.net