March 5, 2024, 2:49 p.m. | Mathilde Caron, Ahmet Iscen, Alireza Fathi, Cordelia Schmid

cs.CV updates on arXiv.org arxiv.org

arXiv:2403.02041v1 Announce Type: new
Abstract: In this paper, we address web-scale visual entity recognition, specifically the task of mapping a given query image to one of the 6 million existing entities in Wikipedia. One way of approaching a problem of such scale is using dual-encoder models (eg CLIP), where all the entity names and query images are embedded into a unified space, paving the way for an approximate k-NN search. Alternatively, it is also possible to re-purpose a captioning model …

abstract arxiv clip cs.cv encoder generative image mapping paper query recognition scale type visual web wikipedia

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Data Scientist, gTech Ads

@ Google | Mexico City, CDMX, Mexico

Lead, Data Analytics Operations

@ Zocdoc | Pune, Maharashtra, India