March 12, 2024, 4:52 a.m. | Alberto Testoni, Juell Sprott, Sandro Pezzelle

cs.CL updates on arXiv.org arxiv.org

arXiv:2403.06935v1 Announce Type: new
Abstract: While human speakers use a variety of different expressions when describing the same object in an image, giving rise to a distribution of plausible labels driven by pragmatic constraints, the extent to which current Vision \& Language Large Language Models (VLLMs) can mimic this crucial feature of language use is an open question. This applies to common, everyday objects, but it is particularly interesting for uncommon or novel objects for which a category label may …

abstract arxiv constraints cs.cl current distribution giving human humans image labels language language models large language large language models llms object objects speakers type vision visual

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne