[P] SimpleGEMM: Fast and minimal tensor core matrix multiplication in CUDA | allainews.com

May 12, 2024, 8:55 p.m. | /u/bjergerk1ng

Machine Learning www.reddit.com

Hello all! Sharing my side project here: [https://github.com/andylolu2/simpleGEMM](https://github.com/andylolu2/simpleGEMM) !

This is an *extremely* minimalistic but fast implementation of matrix multiplication in CUDA. The source code is a single, 200-line CUDA/C++ file which implements fp16 tensor core matrix multiplication, optimised for Turing (SM75) architecture. The goal is to:

1. Write a matmul kernel that does not sacrifice performance. In fact, it's faster than PyTorch/CuBLAS if you [test it on a T4 in Colab](https://colab.research.google.com/github/andylolu2/simpleGEMM/blob/master/colab/simpleGEMM.ipynb)!
2. Make it hackable for new purposes. For …

architecture code core cuda example file fp16 implementation line machinelearning matrix matrix multiplication tensor turing

More from www.reddit.com / Machine Learning

[R] Variational Inference: Reverse KL vs. Forward KL 3 hours ago | www.reddit.com

context exclusive inference machinelearning

[D] Phi-3 models compared side-by-side. 3 hours ago | www.reddit.com

benchmarks knowledge labels language +7

[R] Geometry of data for ML 7 hours ago | www.reddit.com

data diffusion geometry machinelearning +4

[P] ReproModel: Open Source ML Research Toolbox. 9 hours ago | www.reddit.com

app check code computer +15

[Project] YOLOv8 quantization project 11 hours ago | www.reddit.com

fp16 inference jetson jetson orin +7

[D] What are Geoff Hinton's current thoughts on backpropogation as a learning mechanism in the … 15 hours ago | www.reddit.com

brain current geoff geoff hinton +7

Failing to replicate 'Deep Residual Learning' "[P]" 18 hours ago | www.reddit.com

accuracy clear data error +8

[Research] How Can Understanding Sparse Autoencoders in Claude 3 Sonnet Influence Practical AI Applications? 20 hours ago | www.reddit.com

ai applications anthropic applications autoencoders +20

[D] Mistral-7B-v0.3 instruct vs Llama-3 8B Instruct eval in the Medical domain 23 hours ago | www.reddit.com

machinelearning

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Stage - Product Owner Assistant - Data Platform / Business Intelligence (M/F)

@ Pernod Ricard | FR - Paris - The Island

View on ai-jobs.net