Jan. 28, 2024, 4:31 p.m. | /u/bkffadia

Deep Learning www.reddit.com

I work with dna sequences as input to my deep learning model, I save them as one hot encoded numpy array in h5 file. My dataset has 700k examples and 500Go in size. I wanted to make training faster so I have a bunch of questions :

- is it better to store them as 1d arrays (numerical instead of one hot encoding) in h5 file then transform them to one hot encoded arrays during loading would this make things …

array dataset deep learning deeplearning dna examples faster file hot numpy questions save store them training work

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne