April 1, 2024, 4:45 a.m. | Jaisidh Singh, Ishaan Shrivastava, Mayank Vatsa, Richa Singh, Aparna Bharati

cs.CV updates on arXiv.org arxiv.org

arXiv:2403.20312v1 Announce Type: new
Abstract: Existing vision-language models (VLMs) treat text descriptions as a unit, confusing individual concepts in a prompt and impairing visual semantic matching and reasoning. An important aspect of reasoning in logic and language is negations. This paper highlights the limitations of popular VLMs such as CLIP, at understanding the implications of negations, i.e., the effect of the word "not" in a given prompt. To enable evaluation of VLMs on fluent prompts with negations, we present CC-Neg, …

abstract arxiv concepts cs.cv highlights improving language language models learn limitations logic paper popular prompt reasoning semantic text type via vision vision-language models visual vlms

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Data Analyst

@ S&P Global | IN - HYDERABAD SKYVIEW

EY GDS Internship Program - Junior Data Visualization Engineer (June - July 2024)

@ EY | Wrocław, DS, PL, 50-086

Staff Data Scientist

@ ServiceTitan | INT Armenia Yerevan

Master thesis on deterministic AI inference on-board Telecom Satellites

@ Airbus | Taufkirchen / Ottobrunn

Lead Data Scientist

@ Picket | Seattle, WA