Oct. 18, 2023, 1:56 a.m. | /u/bjergerk1ng

Machine Learning www.reddit.com

Hi r/MachineLearning! I recently went into the CUDA programming rabbit hole. In the process, I came across matrix multiplication and was amazed by how complicated the algorithm is in CUDA (especially if you want to get the best performance). I found the learning process quite gruelling (the CUDA docs were very average), so I wrote a tiny blog which hopefully helps anyone in the same position.

You can read the blog on [Medium (no paywall)](https://towardsdatascience.com/matrix-multiplication-on-the-gpu-e920e50207a8?source=friends_link&sk=020a915e1fce7d910aacda22bce89129) or [HackMD](https://hackmd.io/@andylo/matrix-multiplication-on-gpu). It would probably …

algorithm cuda found gpu machinelearning matrix matrix multiplication peak performance process programming rabbit the algorithm

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne