[R] Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models - University of Washington 2024 - Over 10x faster in inference than existing systems! | allainews.com

Feb. 13, 2024, 4:34 p.m. | /u/Singularian2501

Machine Learning www.reddit.com

Paper: [https://arxiv.org/abs/2402.07033](https://arxiv.org/abs/2402.07033)

Github: [https://github.com/efeslab/fiddler](https://github.com/efeslab/fiddler)

Abstract:

>Large Language Models (LLMs) based on Mixture-of-Experts (MoE) architecture are showing promising performance on various tasks. However, running them on resource-constrained settings, where GPU memory resources are not abundant, is challenging due to huge model sizes. Existing systems that offload model weights to CPU memory suffer from the significant overhead of frequently moving data between CPU and GPU. In this paper, we propose Fiddler, a resource-efficient inference engine with CPU-GPU orchestration for MoE models. The …

abstract architecture cpu data experts gpu language language models large language large language models llms machinelearning memory moe moving performance resources running systems tasks them

More from www.reddit.com / Machine Learning

[D] ECCV-2024 reviews are out 10 hours ago | www.reddit.com

eccv machinelearning reviews

[D] ICLR Outstanding Paper Awards. Congratulations! 13 hours ago | www.reddit.com

abstract feature identify images +12

[D] Where does the term "feature" come from? 14 hours ago | www.reddit.com

call engineering feature features +8

[D] Any encoder only model having bigger max token than 512 (BERT, Roberta, etc)? 20 hours ago | www.reddit.com

advance bert bigger class +8

[R] AlphaMath Almost Zero: process Supervision without process 21 hours ago | www.reddit.com

abstract code errors however +15

[D] ECCV 2024 Review Discussion 21 hours ago | www.reddit.com

center conferences eccv machinelearning +5

[D] Is it a good idea for a 3rd year PhD student to start a … 23 hours ago | www.reddit.com

academic extra good hearing +7

[D] Use VQ-VAEs for SSL? 1 day ago | www.reddit.com

create diffusion diffusion models embedding +10

[D] Matrix Profile vs. Deep Learning for Multivariate Time Series 1 day, 2 hours ago | www.reddit.com

context curiosity data deep learning +16

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net