all AI news
Curriculum Learning for Data-Efficient Vision-Language Alignment. (arXiv:2207.14525v1 [cs.CV])
cs.CL updates on arXiv.org arxiv.org
Aligning image and text encoders from scratch using contrastive learning
requires large amounts of paired image-text data. We alleviate this need by
aligning individually pre-trained language and vision representation models
using a much smaller amount of paired data, augmented with a curriculum
learning algorithm to learn fine-grained vision-language alignments. TOnICS
(Training with Ontology-Informed Contrastive Sampling) initially samples
minibatches whose image-text pairs contain a wide variety of objects to learn
object-level alignment, and progressively samples minibatches where all
image-text pairs contain …
alignment arxiv curriculum curriculum learning cv data language learning vision