Web: https://www.reddit.com/r/computervision/comments/sbgbtk/comparison_of_inference_time_between_convolution/

Jan. 24, 2022, 7:21 a.m. | /u/AaronSpalding

Computer Vision reddit.com

I am not very familiar with ViT (Transformer) based networks. But I tried https://github.com/rstrudel/segmenter to replace some CNN based segmentation nets.

The performance is better and the total number of parameters of the transformer is obviously larger than that of the CNN. However, the inference time is even slightly longer than CNN (slower inference). Is it normal? I am not sure if there are some common sense or conclusion about the inference speed of ViT compared with CNN, but I …

comparison computervision time transformers

