Nov. 18, 2022, 2:12 a.m. | Zhongyi Lin, Louis Feng, Ehsan K. Ardestani, Jaewon Lee, John Lundell, Changkyu Kim, Arun Kejariwal, John D. Owens

cs.LG updates on arXiv.org arxiv.org

We devise a performance model for GPU training of Deep Learning
Recommendation Models (DLRM), whose GPU utilization is low compared to other
well-optimized CV and NLP models. We show that both the device active time (the
sum of kernel runtimes) but also the device idle time are important components
of the overall device time. We therefore tackle them separately by (1) flexibly
adopting heuristic-based and ML-based kernel performance models for operators
that dominate the device active time, and (2) categorizing …

arxiv building deep learning gpus performance recommendation recommendation model training

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York