all AI news
Boosting Visual-Language Models by Exploiting Hard Samples
March 12, 2024, 4:49 a.m. | Haonan Wang, Minbin Huang, Runhui Huang, Lanqing Hong, Hang Xu, Tianyang Hu, Xiaodan Liang, Zhenguo Li, Hong Cheng, Kenji Kawaguchi
cs.CV updates on arXiv.org arxiv.org
Abstract: Contrastive Language-Image Pre-training (CLIP) has become the standard for learning cross-modal representations between images and text. Efforts to improve its capabilities typically demand the collection of additional data and retraining with new loss functions. While effective, the added requirements limit their practical use due to the increased resource and time investments needed. In this work, we present HELIP, a cost-effective strategy tailored to enhance the performance of existing CLIP models without the need for training …
abstract arxiv become boosting capabilities clip collection cs.cv data demand functions image images language language models loss modal practical pre-training requirements retraining samples standard text training type visual
More from arxiv.org / cs.CV updates on arXiv.org
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
1 day, 20 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Director, Clinical Data Science
@ Aura | Remote USA
Research Scientist, AI (PhD)
@ Meta | Menlo Park, CA | New York City