April 18, 2024, 4:45 a.m. | Dan Song, Xinwei Fu, Weizhi Nie, Wenhui Li, Lanjun Wang, You Yang, Anan Liu

cs.CV updates on arXiv.org arxiv.org

arXiv:2311.18402v2 Announce Type: replace
Abstract: Large-scale pre-trained models have demonstrated impressive performance in vision and language tasks within open-world scenarios. Due to the lack of comparable pre-trained models for 3D shapes, recent methods utilize language-image pre-training to realize zero-shot 3D shape recognition. However, due to the modality gap, pretrained language-image models are not confident enough in the generalization to 3D shape recognition. Consequently, this paper aims to improve the confidence with view selection and hierarchical prompts. Leveraging the CLIP model …

abstract arxiv clip cs.cv gap however image language open-world performance pre-trained models pre-training recognition scale tasks training type view vision world zero-shot

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

#13721 - Data Engineer - AI Model Testing

@ Qualitest | Miami, Florida, United States

Elasticsearch Administrator

@ ManTech | 201BF - Customer Site, Chantilly, VA