April 22, 2024, 4:42 a.m. | Juncheng Yang, Zuchao Li, Shuai Xie, Weiping Zhu, Wei Yu, Shijun Li

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.12588v1 Announce Type: cross
Abstract: Adapter-based parameter-efficient transfer learning has achieved exciting results in vision-language models. Traditional adapter methods often require training or fine-tuning, facing challenges such as insufficient samples or resource limitations. While some methods overcome the need for training by leveraging image modality cache and retrieval, they overlook the text modality's importance and cross-modal cues for the efficient adaptation of parameters in visual-language models. This work introduces a cross-modal parameter-efficient approach named XMAdapter. XMAdapter establishes cache models for …

abstract adapter arxiv cache challenges cs.cv cs.lg fine-tuning image language language models limitations modal results retrieval samples training transfer transfer learning type vision vision-language vision-language models

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Data Engineer - Takealot Group (Takealot.com | Superbalist.com | Mr D Food)

@ takealot.com | Cape Town