March 15, 2024, 4:45 a.m. | Yequan Bie, Luyang Luo, Zhixuan Chen, Hao Chen

cs.CV updates on arXiv.org arxiv.org

arXiv:2403.09410v1 Announce Type: new
Abstract: Utilizing potent representations of the large vision-language models (VLMs) to accomplish various downstream tasks has attracted increasing attention. Within this research field, soft prompt learning has become a representative approach for efficiently adapting VLMs such as CLIP, to tasks like image classification. However, most existing prompt learning methods learn text tokens that are unexplainable, which cannot satisfy the stringent interpretability requirements of Explainable Artificial Intelligence (XAI) in high-stakes scenarios like healthcare. To address this issue, …

abstract arxiv attention become classification clip computer concept context cs.ai cs.cv diagnosis however image language language models optimization prompt prompt learning research tasks type via vision vision-language models vlms

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Research Scientist

@ Meta | Menlo Park, CA

Principal Data Scientist

@ Mastercard | O'Fallon, Missouri (Main Campus)