May 24, 2023, 9:12 p.m. | /u/jesst177

Deep Learning www.reddit.com

Hi!

I am trying to improve the memory and speed efficiency of our Pytorch training pipeline. During the inspection I realized our GPU becomes IDLE after every epoch (Visual at the end).

Our environment is:

* 2X V100 (Azure Cloud).
* Pytorch 1.13.
* CUDA 11.6.
* AMP is activated.
* Number of workers is 8.
* DataParallel is used.
* Batch size is 32.
* Pin memory set.
* Drop last set.
* Persistent workers set.
* We are …

azure azure cloud cloud cuda deeplearning efficiency environment gpu massive memory pipeline pytorch speed training usage v100 workers

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US