May 6, 2024, 4:43 a.m. | Aaron Archer, Matthew Fahrbach, Kuikui Liu, Prakash Prabhu

cs.LG updates on arXiv.org arxiv.org

arXiv:2311.03703v2 Announce Type: replace
Abstract: We optimize pipeline parallelism for deep neural network (DNN) inference by partitioning model graphs into $k$ stages and minimizing the running time of the bottleneck stage, including communication. We give practical and effective algorithms for this NP-hard problem, but our emphasis is on tackling the practitioner's dilemma of deciding when a solution is good enough. To this end, we design novel mixed-integer programming (MIP) relaxations for proving lower bounds. Applying these methods to a diverse …

abstract algorithms arxiv communication cs.dc cs.lg deep neural network dnn graphs inference network neural network np-hard partitioning performance pipeline practical running stage type

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US