all AI news
Zero-Shot Distillation for Image Encoders: How to Make Effective Use of Synthetic Data
April 26, 2024, 4:45 a.m. | Niclas Popp, Jan Hendrik Metzen, Matthias Hein
cs.CV updates on arXiv.org arxiv.org
Abstract: Multi-modal foundation models such as CLIP have showcased impressive zero-shot capabilities. However, their applicability in resource-constrained environments is limited due to their large number of parameters and high inference time. While existing approaches have scaled down the entire CLIP architecture, we focus on training smaller variants of the image encoder, which suffices for efficient zero-shot classification. The use of synthetic data has shown promise in distilling representations from larger teachers, resulting in strong few-shot and …
abstract architecture arxiv capabilities clip cs.cv data distillation environments focus foundation however image inference modal multi-modal parameters synthetic synthetic data type while zero-shot
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote