March 25, 2024, 4:45 a.m. | Beichen Zhang, Pan Zhang, Xiaoyi Dong, Yuhang Zang, Jiaqi Wang

cs.CV updates on arXiv.org arxiv.org

arXiv:2403.15378v1 Announce Type: new
Abstract: Contrastive Language-Image Pre-training (CLIP) has been the cornerstone for zero-shot classification, text-image retrieval, and text-image generation by aligning image and text modalities. Despite its widespread adoption, a significant limitation of CLIP lies in the inadequate length of text input. The length of the text token is restricted to 77, and an empirical study shows the actual effective length is even less than 20. This prevents CLIP from handling detailed descriptions, limiting its applications for image …

abstract adoption arxiv capability classification clip cs.cv image image generation language lies pre-training retrieval text text-image text-image retrieval token training type zero-shot

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Field Sample Specialist (Air Sampling) - Eurofins Environment Testing – Pueblo, CO

@ Eurofins | Pueblo, CO, United States

Camera Perception Engineer

@ Meta | Sunnyvale, CA