all AI news
MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation. (arXiv:2304.06957v1 [cs.CV])
cs.CV updates on arXiv.org arxiv.org
CLIP (Contrastive Language-Image Pretraining) is well-developed for
open-vocabulary zero-shot image-level recognition, while its applications in
pixel-level tasks are less investigated, where most efforts directly adopt CLIP
features without deliberative adaptations. In this work, we first demonstrate
the necessity of image-pixel CLIP feature adaption, then provide Multi-View
Prompt learning (MVP-SEG) as an effective solution to achieve image-pixel
adaptation and to solve open-vocabulary semantic segmentation. Concretely,
MVP-SEG deliberately learns multiple prompts trained by our Orthogonal
Constraint Loss (OCLoss), by which each prompt …
applications arxiv clip exploit feature features image language loss multiple mvp pixel prompt prompt learning recognition segmentation semantic solution work