all AI news
Composing Object Relations and Attributes for Image-Text Matching
June 19, 2024, 2:45 a.m. | Khoi Pham, Chuong Huynh, Ser-Nam Lim, Abhinav Shrivastava
cs.CV updates on arXiv.org arxiv.org
Abstract: We study the visual semantic embedding problem for image-text matching. Most existing work utilizes a tailored cross-attention mechanism to perform local alignment across the two image and text modalities. This is computationally expensive, even though it is more powerful than the unimodal dual-encoder approach. This work introduces a dual-encoder image-text matching model, leveraging a scene graph to represent captions with nodes for objects and attributes interconnected by relational edges. Utilizing a graph attention network, our …
abstract alignment arxiv attention attributes cs.cv embedding encoder image object problem relations semantic study text type visual work
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
AI Focused Biochemistry Postdoctoral Fellow
@ Lawrence Berkeley National Lab | Berkeley, CA
Senior Data Engineer
@ Displate | Warsaw
Hybrid Cloud Engineer
@ Vanguard | Wayne, PA
Senior Software Engineer
@ F5 | San Jose
Software Engineer, Backend, 3+ Years of Experience
@ Snap Inc. | Bellevue - 110 110th Ave NE
Global Head of Commercial Data Foundations
@ Sanofi | Cambridge