Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference | allainews.com

April 9, 2024, 4:44 a.m. | Hongzheng Chen, Jiahao Zhang, Yixiao Du, Shaojie Xiang, Zichao Yue, Niansong Zhang, Yaohui Cai, Zhiru Zhang

cs.LG updates on arXiv.org arxiv.org

arXiv:2312.15159v2 Announce Type: replace
Abstract: Recent advancements in large language models (LLMs) boasting billions of parameters have generated a significant demand for efficient deployment in inference workloads. The majority of existing approaches rely on temporal architectures that reuse hardware units for different network layers and operators. However, these methods often encounter challenges in achieving low latency due to considerable memory access overhead. This paper investigates the feasibility and potential of model-specific spatial acceleration for LLM inference on FPGAs. Our approach …

abstract architectures arxiv cs.ai cs.ar cs.cl cs.lg demand deployment fpga generated hardware however inference language language model language models large language large language model large language models llms network operators parameters spatial temporal type understanding units workloads

More from arxiv.org / cs.LG updates on arXiv.org

Course Recommender Systems Need to Consider the Job Market 20 hours ago | arxiv.org

abstract arxiv course cs.ir +16

$\texttt{immrax}$: A Parallelizable and Differentiable Toolbox for Interval Analysis and Mixed Monotone Reachability in JAX 20 hours ago | arxiv.org

abstract analysis arxiv compilation +18

Thousands of AI Authors on the Future of AI 20 hours ago | arxiv.org

abstract advanced advanced ai ai progress +21

Graphene: Infrastructure Security Posture Analysis with AI-generated Attack Graphs 20 hours ago | arxiv.org

abstract analysis arxiv assessment +24

Volume-Preserving Transformers for Learning Time Series Data with Structure 20 hours ago | arxiv.org

abstract arxiv cs.lg cs.na +24

Eureka: Human-Level Reward Design via Coding Large Language Models 20 hours ago | arxiv.org

abstract algorithm arxiv bridge +25

Reconstruction of Unstable Heavy Particles Using Deep Symmetry-Preserving Attention Networks 20 hours ago | arxiv.org

abstract arxiv attention cs.lg +11

FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search 20 hours ago | arxiv.org

abstract arxiv become compression +24

Gaussian random field approximation via Stein's method with applications to wide random neural networks 20 hours ago | arxiv.org

abstract applications approximation arxiv +14

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Business Data Analyst

@ Alstom | Johannesburg, GT, ZA

View on ai-jobs.net