March 28, 2024, 4:43 a.m. | Siddharth Singh, Prajwal Singhania, Aditya K. Ranjan, Zack Sating, Abhinav Bhatele

cs.LG updates on arXiv.org arxiv.org

arXiv:2305.13525v2 Announce Type: replace
Abstract: Large communication costs are a critical bottleneck in training state-of-the-art neural networks on distributed systems. This paper introduces AxoNN, a novel four-dimensional (4D) parallelization approach, inspired by Agarwal's algorithm for matrix multiplication, for parallelizing tensor computations in deep learning, AxoNN employs two key strategies to minimize communication overhead. First, we optimize communication by overlapping expensive collective operations (reduce-scatter, all-gather, all-reduce) with computations. Our experiments with a 20-billion parameter transformer model demonstrate that these optimizations deliver …

abstract algorithm art arxiv communication costs cs.ai cs.dc cs.lg cs.pf deep learning distributed distributed systems gpus hybrid key matrix matrix multiplication networks neural networks novel paper parallelization scale state systems tensor training type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

C003549 Data Analyst (NS) - MON 13 May

@ EMW, Inc. | Braine-l'Alleud, Wallonia, Belgium

Marketing Decision Scientist

@ Meta | Menlo Park, CA | New York City