Feb. 1, 2024, 12:42 p.m. | Hao Tan Zichang Tan Jun Li Jun Wan Zhen Lei

cs.CV updates on arXiv.org arxiv.org

Multi-label image recognition is a fundamental task in computer vision. Recently, vision-language models have made notable advancements in this area. However, previous methods often failed to effectively leverage the rich knowledge within language models and instead incorporated label semantics into visual features in a unidirectional manner. In this paper, we propose a Prompt-driven Visual-Linguistic Representation Learning (PVLR) framework to better leverage the capabilities of the linguistic modality. In PVLR, we first introduce a dual-prompting strategy comprising Knowledge-Aware Prompting (KAP) and …

computer computer vision cs.cv features image image recognition knowledge language language models paper prompt recognition representation representation learning semantics vision vision-language models visual

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US