June 14, 2024, 4:48 a.m. | Jaeseok Byun, Dohoon Kim, Taesup Moon

cs.CV updates on arXiv.org arxiv.org

arXiv:2312.06112v2 Announce Type: replace
Abstract: We consider a critical issue of false negatives in Vision-Language Pre-training (VLP), a challenge that arises from the inherent many-to-many correspondence of image-text pairs in large-scale web-crawled datasets. The presence of false negatives can impede achieving optimal performance and even lead to a significant performance drop. To address this challenge, we propose MAFA (MAnaging FAlse negatives), which consists of two pivotal components building upon the recently developed GRouped mIni-baTch sampling (GRIT) strategy: 1) an efficient …

arxiv cs.ai cs.cv false language pre-training replace training type vision vision-language

Senior Data Engineer

@ Displate | Warsaw

Senior Principal Software Engineer

@ Oracle | Columbia, MD, United States

Software Engineer for Manta Systems

@ PXGEO | Linköping, Östergötland County, Sweden

DevOps Engineer

@ Teradyne | Odense, DK

LIDAR System Engineer Trainee

@ Valeo | PRAGUE - PRA2

Business Applications Administrator

@ Allegro | Poznań, Poland