June 19, 2024, 2:45 a.m. | Khoi Pham, Chuong Huynh, Ser-Nam Lim, Abhinav Shrivastava

cs.CV updates on arXiv.org arxiv.org

arXiv:2406.11820v1 Announce Type: new
Abstract: We study the visual semantic embedding problem for image-text matching. Most existing work utilizes a tailored cross-attention mechanism to perform local alignment across the two image and text modalities. This is computationally expensive, even though it is more powerful than the unimodal dual-encoder approach. This work introduces a dual-encoder image-text matching model, leveraging a scene graph to represent captions with nodes for objects and attributes interconnected by relational edges. Utilizing a graph attention network, our …

abstract alignment arxiv attention attributes cs.cv embedding encoder image object problem relations semantic study text type visual work

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

Senior Data Engineer

@ Displate | Warsaw

Hybrid Cloud Engineer

@ Vanguard | Wayne, PA

Senior Software Engineer

@ F5 | San Jose

Software Engineer, Backend, 3+ Years of Experience

@ Snap Inc. | Bellevue - 110 110th Ave NE

Global Head of Commercial Data Foundations

@ Sanofi | Cambridge