Web: https://towardsdatascience.com/smart-distributed-training-on-amazon-sagemaker-with-smd-part-2-c833e7139b5f?source=rss----7f60cf5620c9---4

Sept. 21, 2022, 1:42 p.m. | Chaim Rand

Towards Data Science - Medium towardsdatascience.com

How to Optimize Data Distribution with SageMaker Distributed Data Parallel

Photo by Stephen on Unsplash

This is the second part of a three-part post on the topic of optimizing distributed training. In part one, we provided a brief survey of distributed training algorithms. We noted that common to all algorithms is their reliance on high-speed communication between multiple GPUs. We surmised that a distributed algorithm that accounted for the underlying instance topology, particularly the differences in the communication links …

amazon amazon sagemaker distributed distributed-training horovod machine learning part sagemaker smart tensorflow training

Research Scientists

@ ODU Research Foundation | Norfolk, Virginia

Embedded Systems Engineer (Robotics)

@ Neo Cybernetica | Bedford, New Hampshire

2023 Luis J. Alvarez and Admiral Grace M. Hopper Postdoc Fellowship in Computing Sciences

@ Lawrence Berkeley National Lab | San Francisco, CA

Senior Manager Data Scientist

@ NAV | Remote, US

Senior AI Research Scientist

@ Earth Species Project | Remote anywhere

Research Fellow- Center for Security and Emerging Technology (Multiple Opportunities)

@ University of California Davis | Washington, DC

Staff Fellow - Data Scientist

@ U.S. FDA/Center for Devices and Radiological Health | Silver Spring, Maryland

Staff Fellow - Senior Data Engineer

@ U.S. FDA/Center for Devices and Radiological Health | Silver Spring, Maryland

Data Scientist (Analytics) - Singapore

@ Momos | Singapore, Central, Singapore

Machine Learning Scientist, Drug Discovery

@ Flagship Pioneering, Inc. | Cambridge, MA

Applied Scientist - Computer Vision

@ Flawless | Los Angeles, California, United States

Sr. Data Engineer, Customer Service

@ Wayfair Inc. | Boston, MA