[R] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models - Massachusetts Institute of Technology and NVIDIA Guangxuan Xiao et al - Enables INT8 for LLM bigger than 100B parameters including OPT-175B, BLOOM-176B and | allainews.com

Nov. 21, 2022, 9:37 p.m. | /u/Singularian2501

Machine Learning www.reddit.com

Paper: [https://arxiv.org/abs/2211.10438](https://arxiv.org/abs/2211.10438)

Github: [https://github.com/mit-han-lab/smoothquant](https://github.com/mit-han-lab/smoothquant)

Abstract:

Large language models (LLMs) show excellent performance but are compute- and memory-intensive. Quantization can reduce memory and accelerate inference. However, for LLMs beyond 100 billion parameters, existing methods cannot maintain accuracy or do not run efficiently on hardware. We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs that can be implemented efficiently. We observe that systematic outliers appear at fixed activation channels. …

bigger bloom language language models large language models llm machinelearning massachusetts nvidia opt-175b quantization technology training

More from www.reddit.com / Machine Learning

[D] Is there a more systematic way of choosing the layers or how deep the … 7 hours ago | www.reddit.com

architecture deep learning least machinelearning +6

[D] Where does the real value of a data scientist come from? 11 hours ago | www.reddit.com

code companies data data scientist +11

[D] NVIDIA GPU Benchmarks & Comparison 14 hours ago | www.reddit.com

a100 ada cards cloud +15

[N] 1st Workshop on In-Context Learning at ICML 2024 15 hours ago | www.reddit.com

context context learning icml in-context learning +2

[R] A Careful Examination of Large Language Model Performance on Grade School Arithmetic 15 hours ago | www.reddit.com

abstract benchmark benchmarks claim +21

[D] [R] Are there any methods/works that enable extracting high-quality dense feature map from CLIP/OpenCLIP … 18 hours ago | www.reddit.com

clip compute feature finetuning +8

[P] [D] Is inference time the important performance metric for ML Models on edge/mobile? 23 hours ago | www.reddit.com

apps devices edge embed +15

[D] UI-based Agents - the next big thing? 1 day ago | www.reddit.com

agents ai agents become big +10

[D] Any-dimensional equivariant neural networks 1 day ago | www.reddit.com

abstract assumptions authors cases +18

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net