April 12, 2024, 4:46 a.m. | Haotian Zhang, Haoxuan You, Philipp Dufter, Bowen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang

cs.CV updates on arXiv.org arxiv.org

arXiv:2404.07973v1 Announce Type: new
Abstract: While Ferret seamlessly integrates regional understanding into the Large Language Model (LLM) to facilitate its referring and grounding capability, it poses certain limitations: constrained by the pre-trained fixed visual encoder and failed to perform well on broader tasks. In this work, we unveil Ferret-v2, a significant upgrade to Ferret, with three key designs. (1) Any resolution grounding and referring: A flexible approach that effortlessly handles higher image resolution, improving the model's ability to process and …

abstract arxiv capability cs.cv encoder ferret language language model language models large language large language model large language models limitations llm regional tasks type understanding visual work

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

Customer Data Analyst with Spanish

@ Michelin | Voluntari

HC Data Analyst - Senior

@ Leidos | 1662 Intelligence Community Campus - Bethesda MD

Healthcare Research & Data Analyst- Infectious, Niche, Rare Disease

@ Clarivate | Remote (121- Massachusetts)

Data Analyst (maternity leave cover)

@ Clarivate | R155-Belgrade

Sales Enablement Data Analyst (Remote)

@ CrowdStrike | USA TX Remote