all AI news
Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification
March 12, 2024, 4:49 a.m. | Sravanti Addepalli, Ashish Ramayee Asokan, Lakshay Sharma, R. Venkatesh Babu
cs.CV updates on arXiv.org arxiv.org
Abstract: Vision-Language Models (VLMs) such as CLIP are trained on large amounts of image-text pairs, resulting in remarkable generalization across several data distributions. However, in several cases, their expensive training and data collection/curation costs do not justify the end application. This motivates a vendor-client paradigm, where a vendor trains a large-scale VLM and grants only input-output access to clients on a pay-per-query basis in a black-box setting. The client aims to minimize inference cost by distilling …
abstract application arxiv cases classification client clip collection costs cs.cv curation data data collection domain however image language language models paradigm text the end training type vendor vision vision-language models vlms
More from arxiv.org / cs.CV updates on arXiv.org
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
1 day, 15 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Intern Large Language Models Planning (f/m/x)
@ BMW Group | Munich, DE
Data Engineer Analytics
@ Meta | Menlo Park, CA | Remote, US