all AI news
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
April 12, 2024, 4:46 a.m. | Haotian Zhang, Haoxuan You, Philipp Dufter, Bowen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang
cs.CV updates on arXiv.org arxiv.org
Abstract: While Ferret seamlessly integrates regional understanding into the Large Language Model (LLM) to facilitate its referring and grounding capability, it poses certain limitations: constrained by the pre-trained fixed visual encoder and failed to perform well on broader tasks. In this work, we unveil Ferret-v2, a significant upgrade to Ferret, with three key designs. (1) Any resolution grounding and referring: A flexible approach that effortlessly handles higher image resolution, improving the model's ability to process and …
abstract arxiv capability cs.cv encoder ferret language language model language models large language large language model large language models limitations llm regional tasks type understanding visual work
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer
@ GPTZero | Toronto, Canada
Customer Data Analyst with Spanish
@ Michelin | Voluntari
HC Data Analyst - Senior
@ Leidos | 1662 Intelligence Community Campus - Bethesda MD
Healthcare Research & Data Analyst- Infectious, Niche, Rare Disease
@ Clarivate | Remote (121- Massachusetts)
Data Analyst (maternity leave cover)
@ Clarivate | R155-Belgrade
Sales Enablement Data Analyst (Remote)
@ CrowdStrike | USA TX Remote