Jan. 28, 2024, 4:31 p.m. | /u/bkffadia

Machine Learning www.reddit.com

I work with dna sequences as input to my deep learning model, I save them as one hot encoded numpy array in h5 file. My dataset has 700k examples and 500Go in size. I wanted to make training faster so I have a bunch of questions :

- is it better to store them as 1d arrays (numerical instead of one hot encoding) in h5 file then transform them to one hot encoded arrays during loading would this make things …

array dataset deep learning dna examples faster file hot machinelearning numpy questions save them training work

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US