April 18, 2024, 4:45 a.m. | Enming Zhang, Bingke Zhu, Yingying Chen, Qinghai Miao, Ming Tang, Jinqiao Wang

cs.CV updates on arXiv.org arxiv.org

arXiv:2404.10357v2 Announce Type: replace
Abstract: Vision-Language Models (VLMs), such as CLIP, play a foundational role in various cross-modal applications. To fully leverage VLMs' potential in adapting to downstream tasks, context optimization methods like Prompt Tuning are essential. However, one key limitation is the lack of diversity in prompt templates, whether they are hand-crafted or learned through additional modules. This limitation restricts the capabilities of pretrained VLMs and can result in incorrect predictions in downstream tasks. To address this challenge, we …

arxiv cs.cv knowledge language language models optimization prompt prompt learning representation type via vision vision-language vision-language models

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Software Engineer, Machine Learning (Tel Aviv)

@ Meta | Tel Aviv, Israel

Senior Data Scientist- Digital Government

@ Oracle | CASABLANCA, Morocco