all AI news
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
June 13, 2024, 4:46 a.m. | Irene Huang, Wei Lin, M. Jehanzeb Mirza, Jacob A. Hansen, Sivan Doveh, Victor Ion Butoi, Roei Herzig, Assaf Arbelle, Hilde Kuhene, Trevor Darrel, Chua
cs.CV updates on arXiv.org arxiv.org
Abstract: Compositional Reasoning (CR) entails grasping the significance of attributes, relations, and word order. Recent Vision-Language Models (VLMs), comprising a visual encoder and a Large Language Model (LLM) decoder, have demonstrated remarkable proficiency in such reasoning tasks. This prompts a crucial question: have VLMs effectively tackled the CR challenge? We conjecture that existing CR benchmarks may not adequately push the boundaries of modern VLMs due to the reliance on an LLM-only negative text generation pipeline. Consequently, …
abstract arxiv attributes cs.cv decoder encoder evaluation grasping language language model language models large language large language model llm modern prompts question reasoning relations significance tasks type vision vision-language vision-language models visual vlms word
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
AI Focused Biochemistry Postdoctoral Fellow
@ Lawrence Berkeley National Lab | Berkeley, CA
Senior Data Engineer
@ Displate | Warsaw
Hybrid Cloud Engineer
@ Vanguard | Wayne, PA
Senior Software Engineer
@ F5 | San Jose
Software Engineer, Backend, 3+ Years of Experience
@ Snap Inc. | Bellevue - 110 110th Ave NE
Global Head of Commercial Data Foundations
@ Sanofi | Cambridge