all AI news
FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
April 24, 2024, 4:44 a.m. | Hang Hua, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo
cs.CV updates on arXiv.org arxiv.org
Abstract: Recent progress in large-scale pre-training has led to the development of advanced vision-language models (VLMs) with remarkable proficiency in comprehending and generating multimodal content. Despite the impressive ability to perform complex reasoning for VLMs, current models often struggle to effectively and precisely capture the compositional information on both the image and text sides. To address this, we propose FineMatch, a new aspect-based fine-grained text and image matching benchmark, focusing on text and image mismatch detection …
abstract advanced advanced vision arxiv cs.cl cs.cv current detection development fine-grained image language language models multimodal multimodal content pre-training progress reasoning scale struggle text training type vision vision-language vision-language models vlms
More from arxiv.org / cs.CV updates on arXiv.org
Compact 3D Scene Representation via Self-Organizing Gaussian Grids
1 day, 11 hours ago |
arxiv.org
Fingerprint Matching with Localized Deep Representation
1 day, 11 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne