April 15, 2024, 4:44 a.m. | Lianyu Hu, Tongkai Shi, Liqing Gao, Zekang Liu, Wei Feng

cs.CV updates on arXiv.org arxiv.org

arXiv:2404.08226v1 Announce Type: new
Abstract: The increase of web-scale weakly labelled image-text pairs have greatly facilitated the development of large-scale vision-language models (e.g., CLIP), which have shown impressive generalization performance over a series of downstream tasks. However, the massive model size and scarcity of available data limit their applications to fine-tune the whole model in downstream tasks. Besides, fully fine-tuning the model easily forgets the generic essential knowledge acquired in the pretraining stage and overfits the downstream data. To enable …

abstract applications arxiv clip continuous cs.cv data development however image improving language language models massive performance recognition scale series tasks text type vision vision-language models web

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

#13721 - Data Engineer - AI Model Testing

@ Qualitest | Miami, Florida, United States

Elasticsearch Administrator

@ ManTech | 201BF - Customer Site, Chantilly, VA