April 13, 2023, 11:46 p.m. | /u/incrapnito

Computer Vision www.reddit.com

I am sharing my scratch PyTorch implementation of Vision Transformer. It has a detailed step-by-step guide of Self-attention and model specifics for learning Vision Transformers. The network is a small scaled-down version of the original architecture and achieves around 99.4% test Accuracy on MNIST and 92.5% on FashionMNIST.

Hope you find it helpful. Feedbacks appreciated.

GitHub: [https://github.com/s-chh/PyTorch-Vision-Transformer-ViT-MNIST](https://github.com/s-chh/PyTorch-Vision-Transformer-ViT-MNIST)

accuracy architecture attention computervision guide implementation mnist network pytorch self-attention small test transformer transformers vision

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote