all AI news
HiVLP: Hierarchical Vision-Language Pre-Training for Fast Image-Text Retrieval. (arXiv:2205.12105v1 [cs.CV])
cs.CV updates on arXiv.org arxiv.org
In the past few years, the emergence of vision-language pre-training (VLP)
has brought cross-modal retrieval to a new era. However, due to the latency and
computation demand, it is commonly challenging to apply VLP in a real-time
online retrieval system. To alleviate the defect, this paper proposes a
\textbf{Hi}erarchical \textbf{V}ision-\textbf{}Language \textbf{P}re-Training
(\textbf{HiVLP}) for fast Image-Text Retrieval (ITR). Specifically, we design a
novel hierarchical retrieval objective, which uses the representation of
different dimensions for coarse-to-fine ITR, i.e., using low-dimensional
representation for …
arxiv cv hierarchical image language pre-training retrieval text training vision