Nov. 11, 2022, 2:11 a.m. | Mark Zhao, Dhruv Choudhary, Devashish Tyagi, Ajay Somani, Max Kaplan, Sung-Han Lin, Sarunya Summa, Jongsoo Park, Aarti Basant, Niket Agarwal, Carole-J

cs.LG updates on arXiv.org arxiv.org

We present RecD (Recommendation Deduplication), a suite of end-to-end
infrastructure optimizations across the Deep Learning Recommendation Model
(DLRM) training pipeline. RecD addresses immense storage, preprocessing, and
training overheads caused by feature duplication inherent in industry-scale
DLRM training datasets. Feature duplication arises because DLRM datasets are
generated from interactions. While each user session can generate multiple
training samples, many features' values do not change across these samples. We
demonstrate how RecD exploits this property, end-to-end, across a deployed
training pipeline. RecD …

arxiv deep learning infrastructure recommendation recommendation model training

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York