Jan. 21, 2022, 2:10 a.m. | Zhongyi Lin, Louis Feng, Ehsan K. Ardestani, Jaewon Lee, John Lundell, Changkyu Kim, Arun Kejariwal, John D. Owens

cs.LG updates on arXiv.org arxiv.org

We devise a performance model for GPU training of Deep Learning
Recommendation Models (DLRM), whose GPU utilization is low compared to other
well-optimized CV and NLP models. We show that both the device active time (the
sum of kernel runtimes) and the device idle time are important components of
the overall device time. We therefore tackle them separately by (1) flexibly
adopting heuristic-based and ML-based kernel performance models for operators
that dominate the device active time, and (2) categorizing operator …

arxiv building deep learning gpus learning performance training

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Engineer - Data Science Operations

@ causaLens | London - Hybrid, England, United Kingdom

F0138 - LLM Developer (AI NLP)

@ Ubiquiti Inc. | Taipei

Staff Engineer, Database

@ Nagarro | Gurugram, India

Artificial Intelligence Assurance Analyst

@ Booz Allen Hamilton | USA, VA, McLean (8251 Greensboro Dr)