all AI news
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention. (arXiv:2305.07027v1 [cs.CV])
cs.CV updates on arXiv.org arxiv.org
Vision transformers have shown great success due to their high model
capabilities. However, their remarkable performance is accompanied by heavy
computation costs, which makes them unsuitable for real-time applications. In
this paper, we propose a family of high-speed vision transformers named
EfficientViT. We find that the speed of existing transformer models is commonly
bounded by memory inefficient operations, especially the tensor reshaping and
element-wise functions in MHSA. Therefore, we design a new building block with
a sandwich layout, i.e., using …
applications arxiv attention computation costs family memory paper performance real-time real-time applications speed success transformer transformers vision