Feb. 13, 2024, 4:34 p.m. | /u/Singularian2501

Machine Learning www.reddit.com

Paper: [https://arxiv.org/abs/2402.07033](https://arxiv.org/abs/2402.07033)

Github: [https://github.com/efeslab/fiddler](https://github.com/efeslab/fiddler)

Abstract:

>Large Language Models (LLMs) based on Mixture-of-Experts (MoE) architecture are showing promising performance on various tasks. However, running them on resource-constrained settings, where GPU memory resources are not abundant, is challenging due to huge model sizes. Existing systems that offload model weights to CPU memory suffer from the significant overhead of frequently moving data between CPU and GPU. In this paper, we propose Fiddler, a resource-efficient inference engine with CPU-GPU orchestration for MoE models. The …

abstract architecture cpu data experts gpu language language models large language large language models llms machinelearning memory moe moving performance resources running systems tasks them

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead Data Engineer

@ WorkMoney | New York City, United States - Remote