Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model | allainews.com

June 21, 2024, 4:52 a.m. | Shraman Pramanick, Guangxing Han, Rui Hou, Sayan Nag, Ser-Nam Lim, Nicolas Ballas, Qifan Wang, Rama Chellappa, Amjad Almahairi

cs.CV updates on arXiv.org arxiv.org

arXiv:2312.12423v2 Announce Type: replace
Abstract: The ability of large language models (LLMs) to process visual inputs has given rise to general-purpose vision systems, unifying various vision-language (VL) tasks by instruction tuning. However, due to the enormous diversity in input-output formats in the vision domain, existing general-purpose models fail to successfully integrate segmentation and multi-image inputs with coarse-level tasks into a single framework. In this work, we introduce VistaLLM, a powerful visual system that addresses coarse- and fine-grained VL tasks over …

arxiv cs.ai cs.cv designing general jack language language model master replace tasks type vision vision-language

More from arxiv.org / cs.CV updates on arXiv.org

InstantGroup: Instant Template Generation for Scalable Group of Brain MRI Registration 13 hours ago | arxiv.org

abstract arxiv brain costs +15

Visual Odometry with Neuromorphic Resonator Networks 13 hours ago | arxiv.org

abstract arxiv cs.ai cs.cv +15

CTNeRF: Cross-Time Transformer for Dynamic Neural Radiance Field from Monocular Video 13 hours ago | arxiv.org

arxiv cs.cv dynamic neural radiance field +4

InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models 13 hours ago | arxiv.org

arxiv cs.cv instruction-tuned language +6

Towards Training-free Open-world Segmentation via Image Prompt Foundation Models 13 hours ago | arxiv.org

abstract arxiv computer computer vision +33

Re-initialization-free Level Set Method via Molecular Beam Epitaxy Equation Regularization for Image Segmentation 13 hours ago | arxiv.org

abstract arxiv become continuity +15

ObjFormer: Learning Land-Cover Changes From Paired OSM Data and Optical High-Resolution Imagery via Object-Guided Transformer 13 hours ago | arxiv.org

arxiv cs.ai cs.cv cs.cy +9

Unsupervised Open-Vocabulary Object Localization in Videos 13 hours ago | arxiv.org

abstract advances arxiv attention +21

Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network 13 hours ago | arxiv.org

arxiv compensation cs.cv eess.iv +6

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

View on ai-jobs.net

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

PhD Student AI simulation electric drive (f/m/d)

@ Volkswagen Group | Kassel, DE, 34123

View on ai-jobs.net

AI Privacy Research Lead

@ Leidos | 6314 Remote/Teleworker US

View on ai-jobs.net

Senior Platform System Architect, Silicon

@ Google | New Taipei, Banqiao District, New Taipei City, Taiwan

View on ai-jobs.net

Fabrication Hardware Litho Engineer, Quantum AI

@ Google | Goleta, CA, USA

View on ai-jobs.net