Sept. 21, 2022, 1:42 p.m. | Chaim Rand

Towards Data Science - Medium towardsdatascience.com

How to Optimize Data Distribution with SageMaker Distributed Data Parallel

Photo by Stephen on Unsplash

This is the second part of a three-part post on the topic of optimizing distributed training. In part one, we provided a brief survey of distributed training algorithms. We noted that common to all algorithms is their reliance on high-speed communication between multiple GPUs. We surmised that a distributed algorithm that accounted for the underlying instance topology, particularly the differences in the communication links …

amazon amazon sagemaker distributed distributed-training machine learning part sagemaker smart tensorflow training

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analytics & Insight Specialist, Customer Success

@ Fortinet | Ottawa, ON, Canada

Account Director, ChatGPT Enterprise - Majors

@ OpenAI | Remote - Paris