GROUNDHOG: Grounding Large Language Models to Holistic Segmentation | allainews.com

April 17, 2024, 4:47 a.m. | Yichi Zhang, Ziqiao Ma, Xiaofeng Gao, Suhaila Shakiah, Qiaozi Gao, Joyce Chai

cs.CL updates on arXiv.org arxiv.org

arXiv:2402.16846v2 Announce Type: replace-cross
Abstract: Most multimodal large language models (MLLMs) learn language-to-object grounding through causal language modeling where grounded objects are captured by bounding boxes as sequences of location tokens. This paradigm lacks pixel-level representations that are important for fine-grained visual understanding and diagnosis. In this work, we introduce GROUNDHOG, an MLLM developed by grounding Large Language Models to holistic segmentation. GROUNDHOG incorporates a masked feature extractor and converts extracted features into visual entity tokens for the MLLM backbone, …

abstract arxiv causal cs.ai cs.cl cs.cv diagnosis fine-grained language language models large language large language models learn location mllms modeling multimodal object objects paradigm pixel segmentation through tokens type understanding visual work

More from arxiv.org / cs.CL updates on arXiv.org

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation 13 hours ago | arxiv.org

abstract arxiv asr audio +22

Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment 13 hours ago | arxiv.org

abstract accuracy arxiv continuous +17

MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria 13 hours ago | arxiv.org

arxiv cs.cl llms mllm +5

The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics 13 hours ago | arxiv.org

abstract arxiv challenges computational +18

HeLM: Highlighted Evidence augmented Language Model for Enhanced Table-to-Text Generation 13 hours ago | arxiv.org

abstract apis arxiv costs +22

Prompt have evil twins 13 hours ago | arxiv.org

abstract arxiv behavior call +9

Reconstructing Materials Tetrahedron: Challenges in Materials Information Extraction 13 hours ago | arxiv.org

abstract arxiv challenges cond-mat.mtrl-sci +16

SSHR: Leveraging Self-supervised Hierarchical Representations for Multilingual Automatic Speech Recognition 13 hours ago | arxiv.org

abstract arxiv asr attention +19

An Interactive Framework for Profiling News Media Sources 13 hours ago | arxiv.org

abstract arxiv cs.cl fake +10

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Data Engineer - New Graduate

@ Applied Materials | Milan,ITA

View on ai-jobs.net

Lead Machine Learning Scientist

@ Biogen | Cambridge, MA, United States

View on ai-jobs.net