May 27, 2024, 4:46 a.m. | Xu Pan, Aaron Philip, Ziqian Xie, Odelia Schwartz

cs.CV updates on arXiv.org arxiv.org

arXiv:2405.14880v1 Announce Type: new
Abstract: Self-attention in vision transformers has been thought to perform perceptual grouping where tokens attend to other tokens with similar embeddings, which could correspond to semantically similar features in an image. However, contextualization is also an important and necessary computation for processing signals. Contextualization potentially requires tokens to attend to dissimilar tokens such as those corresponding to backgrounds or different objects, but this effect has not been reported in previous studies. In this study, we investigate …

abstract arxiv attention computation contextualization cs.ai cs.cv embeddings features grouping however image key processing query self-attention thought tokens transformers type vision vision transformers

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

Senior Data Engineer

@ Displate | Warsaw

Staff Software Engineer (Data Platform)

@ Phaidra | Remote

Distributed Compute Engineer

@ Magic | San Francisco

Power Platform Developer/Consultant

@ Euromonitor | Bengaluru, Karnataka, India

Finance Project Senior Manager

@ QIMA | London, United Kingdom