all AI news
Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations
March 13, 2024, 4:42 a.m. | Chenyu You, Yifei Min, Weicheng Dai, Jasjeet S. Sekhon, Lawrence Staib, James S. Duncan
cs.LG updates on arXiv.org arxiv.org
Abstract: Fine-tuning pre-trained vision-language models, like CLIP, has yielded success on diverse downstream tasks. However, several pain points persist for this paradigm: (i) directly tuning entire pre-trained models becomes both time-intensive and computationally costly. Additionally, these tuned models tend to become highly specialized, limiting their practicality for real-world deployment; (ii) recent studies indicate that pre-trained vision-language classifiers may overly depend on spurious features -- patterns that correlate with the target in training data, but are not …
abstract annotations arxiv become clip cs.cv cs.lg diverse fine-tuning however language language models modal multi-modal pain paradigm pre-trained models robustness success tasks type vision vision-language models
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Research Scientist
@ Meta | Menlo Park, CA
Principal Data Scientist
@ Mastercard | O'Fallon, Missouri (Main Campus)