On the Compressibility of Quantized Large Language Models | allainews.com

March 5, 2024, 2:42 p.m. | Yu Mao, Weilan Wang, Hongchao Du, Nan Guan, Chun Jason Xue

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.01384v1 Announce Type: new
Abstract: Deploying Large Language Models (LLMs) on edge or mobile devices offers significant benefits, such as enhanced data privacy and real-time processing capabilities. However, it also faces critical challenges due to the substantial memory requirement of LLMs. Quantization is an effective way of reducing the model size while maintaining good performance. However, even after quantization, LLMs may still be too big to fit entirely into the limited memory of edge or mobile devices and have to …

abstract arxiv benefits capabilities challenges cs.ai cs.cl cs.lg data data privacy devices edge language language models large language large language models llms memory mobile mobile devices privacy processing quantization real-time real-time processing type

More from arxiv.org / cs.LG updates on arXiv.org

Bypassing the Safety Training of Open-Source LLMs with Priming Attacks 20 hours ago | arxiv.org

arxiv attacks cs.ai cs.cl +7

Variational Mode Decomposition-Based Nonstationary Coherent Structure Analysis for Spatiotemporal Data 20 hours ago | arxiv.org

abstract analysis and analysis arxiv +12

Differentially private projection-depth-based medians 20 hours ago | arxiv.org

abstract arxiv cost cs.cr +19

Unified Binary and Multiclass Margin-Based Classification 20 hours ago | arxiv.org

abstract algorithms analysis and analysis +15

An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits 20 hours ago | arxiv.org

abstract arxiv causal causal inference +12

Convergence of flow-based generative models via proximal gradient descent in Wasserstein space 20 hours ago | arxiv.org

abstract advantages analysis arxiv +23

Identifying the Risks of LM Agents with an LM-Emulated Sandbox 20 hours ago | arxiv.org

abstract advances agents amplify +22

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs 20 hours ago | arxiv.org

arxiv cs.ai cs.cl cs.lg +6

Robust Online Learning over Networks 20 hours ago | arxiv.org

abstract agent agents arxiv +25

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net