May 1, 2024, 4:45 a.m. | Dongyun Lin, Yi Cheng, Shangbo Mao, Aiyuan Guo, Yiqun Li

cs.CV updates on arXiv.org arxiv.org

arXiv:2404.19168v1 Announce Type: new
Abstract: Large vision-language models have impressively promote the performance of 2D visual recognition under zero/few-shot scenarios. In this paper, we focus on exploiting the large vision-language model, i.e., CLIP, to address zero/few-shot 3D shape recognition based on multi-view representations. The key challenge for both tasks is to generate a discriminative descriptor of the 3D shape represented by multiple view images under the scenarios of either without explicit training (zero-shot 3D shape recognition) or training with a …

abstract aggregation arxiv challenge clip cs.cv few-shot focus key language language model language models network paper performance promote prompt recognition tasks the key type view vision vision-language vision-language models visual

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US