Web: https://www.reddit.com/r/MachineLearning/comments/xgqwvu/r_hydra_attention_efficient_attention_with_many/

Sept. 17, 2022, 4:26 p.m. | /u/Singularian2501

Machine Learning reddit.com

Paper: [https://arxiv.org/abs/2209.07484](https://arxiv.org/abs/2209.07484)


>While transformers have begun to dominate many tasks in vision, applying them to large images is still computationally difficult. A large reason for this is that self-attention scales quadratically with the number of tokens, which in turn, scales quadratically with the image size. On larger images (e.g., 1080p), over 60% of the total computation in the network is spent solely on creating and applying attention matrices. We take a step toward solving this issue by introducing Hydra …

