May 24, 2023, 9:12 p.m. | /u/jesst177

Deep Learning www.reddit.com

Hi!

I am trying to improve the memory and speed efficiency of our Pytorch training pipeline. During the inspection I realized our GPU becomes IDLE after every epoch (Visual at the end).

Our environment is:

* 2X V100 (Azure Cloud).
* Pytorch 1.13.
* CUDA 11.6.
* AMP is activated.
* Number of workers is 8.
* DataParallel is used.
* Batch size is 32.
* Pin memory set.
* Drop last set.
* Persistent workers set.
* We are …

azure azure cloud cloud cuda deeplearning efficiency environment gpu massive memory pipeline pytorch speed training usage v100 workers

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne