Aug. 22, 2022, 2:14 p.m. | Chaim Rand

Towards Data Science - Medium towardsdatascience.com

How to Train Using Millions of Small, Single Sample Files

Photo by Ruben Mishchuk on Unsplash

In a typical deep learning training pipeline, data samples are iteratively loaded from a storage location and fed into the machine learning model. The model learns from these samples and updates its parameters accordingly. As a result, the speed of each training step and, by extension, the overall time to model convergence, is directly impacted by the speed at which the data samples can …

amazon-s3 cloud cloud storage deep learning optimization storage training training data

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Engineer

@ Quantexa | Sydney, New South Wales, Australia

Staff Analytics Engineer

@ Warner Bros. Discovery | NY New York 230 Park Avenue South