March 28, 2024, 4:42 a.m. | Reza Abbasi, Mohammad Samiei, Mohammad Hossein Rohban, Mahdieh Soleymani Baghshah

cs.LG updates on

arXiv:2403.18525v1 Announce Type: cross
Abstract: Vision-language models, such as CLIP, have shown promising Out-of-Distribution (OoD) generalization under various types of distribution shifts. Recent studies attempted to investigate the leading cause of this capability. In this work, we follow the same path, but focus on a specific type of OoD data - images with novel compositions of attribute-object pairs - and study whether such models can successfully classify those images into composition classes. We carefully designed an authentic image test dataset …

abstract arxiv capability clip cs.lg distribution focus language language models object path pivotal role studies type types vision vision-language models work

