May 2, 2022, 1:11 a.m. | Siyu Ren, Kenny Q. Zhu

cs.CL updates on arXiv.org arxiv.org

Current text-image approaches (e.g., CLIP) typically adopt dual-encoder
architecture us- ing pre-trained vision-language representation. However, these
models still pose non-trivial memory requirements and substantial incre- mental
indexing time, which makes them less practical on mobile devices. In this
paper, we present an effective two-stage framework to compress large
pre-trained dual-encoder for lightweight text-image retrieval. The result- ing
model is smaller (39% of the original), faster (1.6x/2.9x for processing
image/text re- spectively), yet performs on par with or bet- ter than …

arxiv compression cv image retrieval stage text text-image

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

Principal Data Architect - Azure & Big Data

@ MGM Resorts International | Home Office - US, NV

GN SONG MT Market Research Data Analyst 11

@ Accenture | Bengaluru, BDC7A