March 26, 2024, 4:48 a.m. | Jiangmeng Li, Wenyi Mo, Wenwen Qiang, Bing Su, Changwen Zheng, Hui Xiong, Ji-Rong Wen

cs.CV updates on arXiv.org arxiv.org

arXiv:2205.11100v2 Announce Type: replace
Abstract: Vision-language models are pre-trained by aligning image-text pairs in a common space to deal with open-set visual concepts. To boost the transferability of the pre-trained models, recent works adopt fixed or learnable prompts, i.e., classification weights are synthesized from natural language describing task-relevant categories, to reduce the gap between tasks in the training and test phases. However, how and what prompts can improve inference performance remains unclear. In this paper, we explicitly clarify the importance …

arxiv cs.cv inference knowledge language language model prompt pruning type vision

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne