all AI news
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models
Feb. 26, 2024, 5:41 a.m. | Ziyi Guan, Hantao Huang, Yupeng Su, Hong Huang, Ngai Wong, Hao Yu
cs.LG updates on arXiv.org arxiv.org
Abstract: Large Language Models (LLMs) have greatly advanced the natural language processing paradigm. However, the high computational load and huge model sizes pose a grand challenge for deployment on edge devices. To this end, we propose APTQ (Attention-aware Post-Training Mixed-Precision Quantization) for LLMs, which considers not only the second-order information of each layer's weights, but also, for the first time, the nonlinear effect of attention outputs on the entire model. We leverage the Hessian trace as …
abstract advanced arxiv attention challenge computational cs.ai cs.cl cs.lg deployment devices edge edge devices language language models language processing large language large language models llms mixed mixed-precision natural natural language natural language processing paradigm precision processing quantization training type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Business Data Scientist, gTech Ads
@ Google | Mexico City, CDMX, Mexico
Lead, Data Analytics Operations
@ Zocdoc | Pune, Maharashtra, India