all AI news
Dissecting Query-Key Interaction in Vision Transformers
May 27, 2024, 4:46 a.m. | Xu Pan, Aaron Philip, Ziqian Xie, Odelia Schwartz
cs.CV updates on arXiv.org arxiv.org
Abstract: Self-attention in vision transformers has been thought to perform perceptual grouping where tokens attend to other tokens with similar embeddings, which could correspond to semantically similar features in an image. However, contextualization is also an important and necessary computation for processing signals. Contextualization potentially requires tokens to attend to dissimilar tokens such as those corresponding to backgrounds or different objects, but this effect has not been reported in previous studies. In this study, we investigate …
abstract arxiv attention computation contextualization cs.ai cs.cv embeddings features grouping however image key processing query self-attention thought tokens transformers type vision vision transformers
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
AI Focused Biochemistry Postdoctoral Fellow
@ Lawrence Berkeley National Lab | Berkeley, CA
Senior Data Engineer
@ Displate | Warsaw
Staff Software Engineer (Data Platform)
@ Phaidra | Remote
Distributed Compute Engineer
@ Magic | San Francisco
Power Platform Developer/Consultant
@ Euromonitor | Bengaluru, Karnataka, India
Finance Project Senior Manager
@ QIMA | London, United Kingdom