all AI news
AWEQ: Post-Training Quantization with Activation-Weight Equalization for Large Language Models. (arXiv:2311.01305v1 [cs.LG])
cs.LG updates on arXiv.org arxiv.org
Large language models(LLMs) exhibit excellent performance across a variety of
tasks, but they come with significant computational and storage costs.
Quantizing these models is an effective way to alleviate this issue. However,
existing methods struggle to strike a balance between model accuracy and
hardware efficiency. This is where we introduce AWEQ, a post-training method
that requires no additional training overhead. AWEQ excels in both
ultra-low-bit quantization and 8-bit weight and activation (W8A8) quantization.
There is an observation that weight quantization …
accuracy arxiv balance computational costs efficiency equalization hardware issue language language models large language large language models llms model accuracy performance quantization storage storage costs strike struggle tasks training