all AI news
Attention-aware Post-training Quantization without Backpropagation
June 21, 2024, 4:46 a.m. | Junhan Kim, Ho-young Kim, Eulrang Cho, Chungman Lee, Joonyoung Kim, Yongkweon Jeon
cs.LG updates on arXiv.org arxiv.org
Abstract: Quantization is a promising solution for deploying large-scale language models (LLMs) on resource-constrained devices. Existing quantization approaches, however, rely on gradient-based optimization, regardless of it being post-training quantization (PTQ) or quantization-aware training (QAT), which becomes problematic for hyper-scale LLMs with billions of parameters. This overhead can be alleviated via recently proposed backpropagation-free PTQ methods; however, their performance is somewhat limited by their lack of consideration of inter-layer dependencies. In this paper, we thus propose a …
abstract arxiv attention backpropagation cs.ai cs.lg deploying devices gradient however language language models llms optimization parameters quantization scale solution training type via
More from arxiv.org / cs.LG updates on arXiv.org
MixerFlow: MLP-Mixer meets Normalising Flows
2 days, 5 hours ago |
arxiv.org
Kernelised Normalising Flows
2 days, 5 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Software Engineer II –Decision Intelligence Delivery and Support
@ Bristol Myers Squibb | Hyderabad
Senior Data Governance Consultant (Remote in US)
@ Resultant | Indianapolis, IN, United States
Power BI Developer
@ Brompton Bicycle | Greenford, England, United Kingdom
VP, Enterprise Applications
@ Blue Yonder | Scottsdale
Data Scientist - Moloco Commerce Media
@ Moloco | Redwood City, California, United States
Senior Backend Engineer (New York)
@ Kalepa | New York City. Hybrid