all AI news
AB-Training: A Communication-Efficient Approach for Distributed Low-Rank Learning
May 3, 2024, 4:52 a.m. | Daniel Coquelin, Katherina Fl\"ugel, Marie Weiel, Nicholas Kiefer, Muhammed \"Oz, Charlotte Debus, Achim Streit, Markus G\"otz
cs.LG updates on arXiv.org arxiv.org
Abstract: Communication bottlenecks hinder the scalability of distributed neural network training, particularly on distributed-memory computing clusters. To significantly reduce this communication overhead, we introduce AB-training, a novel data-parallel training method that decomposes weight matrices into low-rank representations and utilizes independent group-based training. This approach consistently reduces network traffic by 50% across multiple scaling scenarios, increasing the training potential on communication-constrained systems. Our method exhibits regularization effects at smaller scales, leading to improved generalization for models like …
abstract arxiv bottlenecks communication computing cs.ai cs.dc cs.lg data distributed hinder independent low memory network network training neural network novel reduce scalability traffic training type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US