May 8, 2024, 4:43 a.m. | Shuyang Sun, Runjia Li, Philip Torr, Xiuye Gu, Siyang Li

cs.LG updates on arXiv.org arxiv.org

arXiv:2312.07661v3 Announce Type: replace-cross
Abstract: Existing open-vocabulary image segmentation methods require a fine-tuning step on mask labels and/or image-text datasets. Mask labels are labor-intensive, which limits the number of categories in segmentation datasets. Consequently, the vocabulary capacity of pre-trained VLMs is severely reduced after fine-tuning. However, without fine-tuning, VLMs trained under weak image-text supervision tend to make suboptimal mask predictions. To alleviate these issues, we introduce a novel recurrent framework that progressively filters out irrelevant texts and enhances mask quality …

abstract arxiv capacity clip concepts cs.cl cs.cv cs.lg cs.mm datasets endeavor fine-tuning however image labels labor rnn segment segmentation text training type visual visual concepts vlms

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US