Model Compression and Efficient Inference for Large Language Models: A Survey | allainews.com

Feb. 16, 2024, 5:43 a.m. | Wenxiao Wang, Wei Chen, Yicong Luo, Yongliu Long, Zhengkai Lin, Liye Zhang, Binbin Lin, Deng Cai, Xiaofei He

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.09748v1 Announce Type: cross
Abstract: Transformer based large language models have achieved tremendous success. However, the significant memory and computational costs incurred during the inference process make it challenging to deploy large models on resource-constrained devices. In this paper, we investigate compression and efficient inference methods for large language models from an algorithmic perspective. Regarding taxonomy, similar to smaller models, compression and acceleration algorithms for large language models can still be categorized into quantization, pruning, distillation, compact architecture design, dynamic …

abstract arxiv compression computational costs cs.ai cs.cl cs.lg cs.pf deploy devices inference language language models large language large language models large models memory paper process success survey transformer type

More from arxiv.org / cs.LG updates on arXiv.org

Gland Segmentation Via Dual Encoders and Boundary-Enhanced Attention 2 days ago | arxiv.org

abstract arxiv attention automated +8

Sliced Wasserstein with Random-Path Projecting Directions 2 days ago | arxiv.org

abstract applications arxiv cs.ai +12

TIM: An Efficient Temporal Interaction Module for Spiking Transformer 2 days ago | arxiv.org

arxiv cs.cv cs.lg cs.ne +3

Accuracy vs Memory Advantage in the Quantum Simulation of Stochastic Processes 2 days ago | arxiv.org

abstract accuracy arxiv assumptions +20

Accelerating Inference in Molecular Diffusion Models with Latent Representations of Protein Structure 2 days ago | arxiv.org

abstract arxiv biology cs.lg +18

Large Language Models can Strategically Deceive their Users when Put Under Pressure 2 days ago | arxiv.org

abstract agent arxiv behavior +11

Learning Extrinsic Dexterity with Parameterized Manipulation Primitives 2 days ago | arxiv.org

arxiv cs.lg cs.ro manipulation +1

The Un-Kidnappable Robot: Acoustic Localization of Sneaking People 2 days ago | arxiv.org

arxiv cs.lg cs.ro localization +3

Diffusion Models as Stochastic Quantization in Lattice Field Theory 2 days ago | arxiv.org

abstract arxiv cs.lg diffusion +15

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net