Web: https://www.reddit.com/r/computervision/comments/sbgbtk/comparison_of_inference_time_between_convolution/

Jan. 24, 2022, 7:21 a.m. | /u/AaronSpalding

Computer Vision reddit.com

I am not very familiar with ViT (Transformer) based networks. But I tried https://github.com/rstrudel/segmenter to replace some CNN based segmentation nets.

The performance is better and the total number of parameters of the transformer is obviously larger than that of the CNN. However, the inference time is even slightly longer than CNN (slower inference). Is it normal? I am not sure if there are some common sense or conclusion about the inference speed of ViT compared with CNN, but I …

comparison computervision time transformers

Director, Data Engineering and Architecture

@ Chainalysis | California | New York | Washington DC | Remote - USA

Deep Learning Researcher

@ Topaz Labs | Dallas, TX

Sr Data Engineer (Contractor)

@ SADA | US - West

Senior Cloud Database Administrator

@ Findhelp | Remote

Senior Data Analyst

@ System1 | Remote

Speech Machine Learning Research Engineer

@ Samsung Research America | Mountain View, CA