Feb. 8, 2024, 5:46 a.m. | Jirayu Burapacheep Ishan Gaur Agam Bhatia Tristan Thrush

cs.CL updates on arXiv.org arxiv.org

This paper introduces the ColorSwap dataset, designed to assess and improve the proficiency of multimodal models in matching objects with their colors. The dataset is comprised of 2,000 unique image-caption pairs, grouped into 1,000 examples. Each example includes a caption-image pair, along with a ``color-swapped'' pair. We follow the Winoground schema: the two captions in an example have the same words, but the color words have been rearranged to modify different objects. The dataset was created through a novel blend …

cs.cl cs.cv

