April 16, 2024, 4:45 a.m. | Yutaro Yamada, Yingtian Tang, Yoyo Zhang, Ilker Yildirim

cs.LG updates on arXiv.org arxiv.org

arXiv:2212.12043v2 Announce Type: replace-cross
Abstract: Large-scale vision-language models such as CLIP have shown impressive performance on zero-shot image classification and image-to-text retrieval. However, such performance does not realize in tasks that require a finer-grained correspondence between vision and language, such as Visual Question Answering (VQA). As a potential cause of the difficulty of applying these models to VQA and similar tasks, we report an interesting phenomenon of vision-language models, which we call the Concept Association Bias (CAB). We find that …

abstract arxiv association bias classification clip concept cs.cl cs.cv cs.lg however image image-to-text language language models performance question question answering retrieval scale tasks text type vision vision-language models visual vqa zero-shot

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US