all AI news
An Inverse Scaling Law for CLIP Training. (arXiv:2305.07017v1 [cs.CV])
cs.CV updates on arXiv.org arxiv.org
CLIP, the first foundation model that connects images and text, has enabled
many recent breakthroughs in computer vision. However, its associated training
cost is prohibitively high, imposing a significant barrier to its widespread
exploration. In this paper, we present a surprising finding that there exists
an inverse scaling law for CLIP training, whereby the larger the image/text
encoders used, the shorter the sequence length of image/text tokens that can be
applied in training. Moreover, we showcase that the strategy for …
arxiv clip computer computer vision cost exploration foundation foundation model images law paper scaling scaling law text training vision