all AI news
Extending CLIP's Image-Text Alignment to Referring Image Segmentation
April 9, 2024, 4:48 a.m. | Seoyeon Kim, Minguk Kang, Dongwon Kim, Jaesik Park, Suha Kwak
cs.CV updates on arXiv.org arxiv.org
Abstract: Referring Image Segmentation (RIS) is a cross-modal task that aims to segment an instance described by a natural language expression. Recent methods leverage large-scale pretrained unimodal models as backbones along with fusion techniques for joint reasoning across modalities. However, the inherent cross-modal nature of RIS raises questions about the effectiveness of unimodal backbones. We propose RISCLIP, a novel framework that effectively leverages the cross-modal nature of CLIP for RIS. Observing CLIP's inherent alignment between image …
abstract alignment arxiv clip cs.cv fusion however image instance language modal natural natural language nature questions raises reasoning scale segment segmentation text type
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York