all AI news
[D] Best practices for storing multi-TB image datasets for use w/ PyTorch
Feb. 9, 2024, 5:58 p.m. | /u/ppg_dork
Machine Learning www.reddit.com
I'm working with a moderately large deep learning dataset (\~4.4 Tb) of satellite image data. Currently, I have the data stored as NPZ files. Each NPZ file contain the response labels and a time series of imagery.
After digging around, it seems like storing the data in HDF5 might be a better alternative and improve random read speed.
Does anyone have a suggestion for resources on best practices for managing large datasets? The information coming up on Google …
best practices data dataset datasets deep learning file files hello image image data image datasets labels machinelearning practices pytorch satellite series time series
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Principal Applied Scientist
@ Microsoft | Redmond, Washington, United States
Data Analyst / Action Officer
@ OASYS, INC. | OASYS, INC., Pratt Avenue Northwest, Huntsville, AL, United States