[R] Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models - University of Washington 2024 - Over 10x faster in inference than existing systems! | allainews.com

Feb. 13, 2024, 4:34 p.m. | /u/Singularian2501

Machine Learning www.reddit.com

Paper: [https://arxiv.org/abs/2402.07033](https://arxiv.org/abs/2402.07033)

Github: [https://github.com/efeslab/fiddler](https://github.com/efeslab/fiddler)

Abstract:

>Large Language Models (LLMs) based on Mixture-of-Experts (MoE) architecture are showing promising performance on various tasks. However, running them on resource-constrained settings, where GPU memory resources are not abundant, is challenging due to huge model sizes. Existing systems that offload model weights to CPU memory suffer from the significant overhead of frequently moving data between CPU and GPU. In this paper, we propose Fiddler, a resource-efficient inference engine with CPU-GPU orchestration for MoE models. The …

abstract architecture cpu data experts gpu language language models large language large language models llms machinelearning memory moe moving performance resources running systems tasks them

More from www.reddit.com / Machine Learning

[P] Identify toxic underwater air bubbles lurking in the substrate with aquatic ultrasonic scans via … an hour ago | www.reddit.com

arduino classification color identify +11

[D] Is EOS token crucial during pre-training? 6 hours ago | www.reddit.com

documents eos flow information +7

[D] Stack Overflow partnership with OPEN AI 7 hours ago | www.reddit.com

access chart chat chat gpt +16

[D] How does fast inference work with state of the art LLMs? 9 hours ago | www.reddit.com

70b art gpt gpt-4 +11

[D] Llama 3 Monstrosities 1 day, 1 hour ago | www.reddit.com

create easy life llama +4

[D] Get paid for peer reviews on ResearchHub 1 day, 4 hours ago | www.reddit.com

cryptocurrency editor machinelearning mind +6

[D] NER for large text data 1 day, 5 hours ago | www.reddit.com

billion data data scientist hello +8

[P] Table Extraction , Text Extraction 1 day, 5 hours ago | www.reddit.com

block column dataset design +13

[P] LeRobot: Hugging Face's library for real-world robotics 1 day, 7 hours ago | www.reddit.com

academia advanced advanced ai ai development +13

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead Data Engineer

@ WorkMoney | New York City, United States - Remote

View on ai-jobs.net