May 6, 2024, 4:42 a.m. | Shaoyuan Chen, Yutong Lin, Mingxing Zhang, Yongwei Wu

cs.LG updates on arXiv.org arxiv.org

arXiv:2405.01814v1 Announce Type: new
Abstract: Transformer-based large language models (LLMs) exhibit impressive performance in generative tasks but introduce significant challenges in real-world serving due to inefficient use of the expensive, computation-optimized accelerators. This mismatch arises from the autoregressive nature of LLMs, where the generation phase comprises operators with varying resource demands. Specifically, the attention operator is memory-intensive, exhibiting a memory access pattern that clashes with the strengths of modern accelerators, especially as context length increases. To enhance the efficiency and …

abstract accelerators arxiv attention autoregressive challenges computation cs.dc cs.lg economic generative inference language language model language models large language large language model large language models llms nature operators performance tasks transformer type world

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US