Aug. 1, 2022, 1:11 a.m. | Tejas Srinivasan, Xiang Ren, Jesse Thomason

cs.CL updates on arXiv.org arxiv.org

Aligning image and text encoders from scratch using contrastive learning
requires large amounts of paired image-text data. We alleviate this need by
aligning individually pre-trained language and vision representation models
using a much smaller amount of paired data, augmented with a curriculum
learning algorithm to learn fine-grained vision-language alignments. TOnICS
(Training with Ontology-Informed Contrastive Sampling) initially samples
minibatches whose image-text pairs contain a wide variety of objects to learn
object-level alignment, and progressively samples minibatches where all
image-text pairs contain …

alignment arxiv curriculum curriculum learning cv data language learning vision

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

Senior Applied Data Scientist

@ dunnhumby | London

Principal Data Architect - Azure & Big Data

@ MGM Resorts International | Home Office - US, NV