Nov. 5, 2023, 6:43 a.m. | Baisong Li, Xingwang Wang, Haixiao Xu

cs.LG updates on arXiv.org arxiv.org

Large language models(LLMs) exhibit excellent performance across a variety of
tasks, but they come with significant computational and storage costs.
Quantizing these models is an effective way to alleviate this issue. However,
existing methods struggle to strike a balance between model accuracy and
hardware efficiency. This is where we introduce AWEQ, a post-training method
that requires no additional training overhead. AWEQ excels in both
ultra-low-bit quantization and 8-bit weight and activation (W8A8) quantization.
There is an observation that weight quantization …

accuracy arxiv balance computational costs efficiency equalization hardware issue language language models large language large language models llms model accuracy performance quantization storage storage costs strike struggle tasks training

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York