May 6, 2022, 5:07 a.m. | Bhaskar Agarwal

Towards Data Science - Medium towardsdatascience.com

Part 2 of the articles on AI with HPC: parallelising a CNN with Horovod and GPUs to obtain a 75x-150x speed-up.

Photo by Robert Katzki on Unsplash

In part 1 of the series we looked at how it is possible to get a ~1500x speed-up in IO operations with a few lines of Python using the multiprocessing module. In this article, we will look at parallelising a deep learning code and reducing the training time from roughly 13 hours to …

deep-dives deep learning gpu network neural network neural networks parallel-computing reduce time training

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote