Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models | allainews.com

April 12, 2024, 4:46 a.m. | Haotian Zhang, Haoxuan You, Philipp Dufter, Bowen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang

cs.CV updates on arXiv.org arxiv.org

arXiv:2404.07973v1 Announce Type: new
Abstract: While Ferret seamlessly integrates regional understanding into the Large Language Model (LLM) to facilitate its referring and grounding capability, it poses certain limitations: constrained by the pre-trained fixed visual encoder and failed to perform well on broader tasks. In this work, we unveil Ferret-v2, a significant upgrade to Ferret, with three key designs. (1) Any resolution grounding and referring: A flexible approach that effortlessly handles higher image resolution, improving the model's ability to process and …

abstract arxiv capability cs.cv encoder ferret language language model language models large language large language model large language models limitations llm regional tasks type understanding visual work

More from arxiv.org / cs.CV updates on arXiv.org

SSL-OTA: Unveiling Backdoor Threats in Self-Supervised Learning for Object Detection 21 hours ago | arxiv.org

abstract adoption arxiv attacks +19

MELEP: A Novel Predictive Measure of Transferability in Multi-Label ECG Diagnosis 21 hours ago | arxiv.org

abstract annotated data arxiv assessment +16

Smartphone region-wise image indoor localization using deep learning for indoor tourist attraction 21 hours ago | arxiv.org

abstract arxiv block concrete +17

LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry 21 hours ago | arxiv.org

abstract arxiv assessment context +15

A Simple Video Segmenter by Tracking Objects Along Axial Trajectories 21 hours ago | arxiv.org

arxiv cs.cv objects replace +4

MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices 21 hours ago | arxiv.org

abstract architecture arxiv cs.cv +21

Exploring Frequency-Inspired Optimization in Transformer for Efficient Single Image Super-Resolution 21 hours ago | arxiv.org

abstract arxiv cs.cv current +15

AViT: Adapting Vision Transformers for Small Skin Lesion Segmentation Datasets 21 hours ago | arxiv.org

arxiv cs.cv datasets replace +6

Exploring One-shot Semi-supervised Federated Learning with A Pre-trained Diffusion Model 21 hours ago | arxiv.org

abstract arxiv challenges client +17

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

View on ai-jobs.net

Customer Data Analyst with Spanish

@ Michelin | Voluntari

View on ai-jobs.net

HC Data Analyst - Senior

@ Leidos | 1662 Intelligence Community Campus - Bethesda MD

View on ai-jobs.net

Healthcare Research & Data Analyst- Infectious, Niche, Rare Disease

@ Clarivate | Remote (121- Massachusetts)

View on ai-jobs.net

Data Analyst (maternity leave cover)

@ Clarivate | R155-Belgrade

View on ai-jobs.net

Sales Enablement Data Analyst (Remote)

@ CrowdStrike | USA TX Remote

View on ai-jobs.net