Web: https://www.reddit.com/r/deeplearning/comments/uipd9k/serverless_training_for_parallel_training/

May 5, 2022, 4:31 a.m. | /u/scb_11

Deep Learning reddit.com

Hi all,

Did anyone explore serverless training on AWS?

We run around 100 experiments on GPUs and all the scheduling and server allocation is static. Wanted to explore Dynamic Training job submissions. Tried AWS sagemaker but asks for a container built through some rules. Is there a place that I can use to directly schedule runs and get weight files?

P.S. Most of the data is Images. Need to perform Segmentation. So GPUs are needed.

