Nov. 3, 2022, 1:14 a.m. | Yanxin Long, Jianhua Han, Runhui Huang, Xu Hang, Yi Zhu, Chunjing Xu, Xiaodan Liang

cs.CV updates on arXiv.org arxiv.org

Inspired by the success of visual-language methods (VLMs) in zero-shot
classification, recent works attempt to extend this line of work into object
detection by leveraging the localization ability of pre-trained VLMs and
generating pseudo labels for unseen classes in a self-training manner. However,
since the current VLMs are usually pre-trained with aligning sentence embedding
with global image embedding, the direct use of them lacks fine-grained
alignment for object instances, which is the core of detection. In this paper,
we propose …

arxiv detection fine-grained self-training text training

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Data Analyst

@ Alstom | Johannesburg, GT, ZA