all AI news
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
March 26, 2024, 4:43 a.m. | Reza Esfandiarpoor, Cristina Menghini, Stephen H. Bach
cs.LG updates on arXiv.org arxiv.org
Abstract: Recent works often assume that Vision-Language Model (VLM) representations are based on visual attributes like shape. However, it is unclear to what extent VLMs prioritize this information to represent concepts. We propose Extract and Explore (EX2), a novel approach to characterize important textual features for VLMs. EX2 uses reinforcement learning to align a large language model with VLM preferences and generates descriptions that incorporate the important features for the VLM. Then, we inspect the descriptions …
abstract arxiv clip concept concepts cs.cl cs.cv cs.lg explore extract however information language language model novel talk through type understanding vision visual vlm vlms
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Field Sample Specialist (Air Sampling) - Eurofins Environment Testing – Pueblo, CO
@ Eurofins | Pueblo, CO, United States
Camera Perception Engineer
@ Meta | Sunnyvale, CA