Web: http://arxiv.org/abs/2206.06829

June 16, 2022, 1:13 a.m. | Peixian Chen, Mengdan Zhang, Yunhang Shen, Kekai Sheng, Yuting Gao, Xing Sun, Ke Li, Chunhua Shen (Tencent Youtu Lab)

cs.CV updates on arXiv.org arxiv.org

Vision transformers (ViTs) are changing the landscape of object detection
approaches. A natural usage of ViTs in detection is to replace the CNN-based
backbone with a transformer-based backbone, which is straightforward and
effective, with the price of bringing considerable computation burden for
inference. More subtle usage is the DETR family, which eliminates the need for
many hand-designed components in object detection but introduces a decoder
demanding an extra-long time to converge. As a result, transformer-based object
detection can not prevail …

arxiv cv detection free transformers

