CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation | allainews.com

March 20, 2024, 4:45 a.m. | Wenqi Zhu, Jiale Cao, Jin Xie, Shuangming Yang, Yanwei Pang

cs.CV updates on arXiv.org arxiv.org

arXiv:2403.12455v1 Announce Type: new
Abstract: Open-vocabulary video instance segmentation strives to segment and track instances belonging to an open set of categories in a video. The vision-language model Contrastive Language-Image Pre-training (CLIP) has shown strong zero-shot classification ability in image-level open-vocabulary task. In this paper, we propose a simple encoder-decoder network, called CLIP-VIS, to adapt CLIP for open-vocabulary video instance segmentation. Our CLIP-VIS adopts frozen CLIP image encoder and introduces three modules, including class-agnostic mask generation, temporal topK-enhanced matching, and …

arxiv clip cs.cv instance segmentation type video video instance segmentation

More from arxiv.org / cs.CV updates on arXiv.org

Pix2HDR -- A pixel-wise acquisition and deep learning-based synthesis approach for high-speed HDR videos 2 days, 6 hours ago | arxiv.org

abstract acquisition applications arxiv +16

LuViRA Dataset Validation and Discussion: Comparing Vision, Radio, and Audio Sensors for Indoor Localization 2 days, 6 hours ago | arxiv.org

abstract algorithms analysis arxiv +17

Unsupervised Representation Learning for 3D MRI Super Resolution with Degradation Adaptation 2 days, 6 hours ago | arxiv.org

abstract arxiv cs.cv deep learning +16

Accurate Spatial Gene Expression Prediction by integrating Multi-resolution features 2 days, 6 hours ago | arxiv.org

abstract analysis arxiv costs +17

TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts 2 days, 6 hours ago | arxiv.org

abstract arxiv attention control +10

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs 2 days, 6 hours ago | arxiv.org

abstract arxiv capabilities clip +21

EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS 2 days, 6 hours ago | arxiv.org

arxiv cs.cv cs.gr type

FRNet: Frustum-Range Networks for Scalable LiDAR Segmentation 2 days, 6 hours ago | arxiv.org

arxiv cs.cv cs.ro lidar +4

A Systematic Review of Deep Learning-based Research on Radiology Report Generation 2 days, 6 hours ago | arxiv.org

abstract arxiv automation clinical +18

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Software Engineering Manager, Generative AI - Characters

@ Meta | Bellevue, WA | Menlo Park, CA | Seattle, WA | New York City | San Francisco, CA

View on ai-jobs.net

Senior Operations Research Analyst / Predictive Modeler

@ LinQuest | Colorado Springs, Colorado, United States

View on ai-jobs.net