all AI news
CLIP-Count: Towards Text-Guided Zero-Shot Object Counting. (arXiv:2305.07304v1 [cs.CV])
cs.CV updates on arXiv.org arxiv.org
Recent advances in visual-language models have shown remarkable zero-shot
text-image matching ability that is transferable to down-stream tasks such as
object detection and segmentation. However, adapting these models for object
counting, which involves estimating the number of objects in an image, remains
a formidable challenge. In this study, we conduct the first exploration of
transferring visual-language models for class-agnostic object counting.
Specifically, we propose CLIP-Count, a novel pipeline that estimates density
maps for open-vocabulary objects with text guidance in a …
arxiv challenge clip count detection image language language models objects segmentation study text text-image