Feb. 26, 2024, 5:41 a.m. | Ziyi Guan, Hantao Huang, Yupeng Su, Hong Huang, Ngai Wong, Hao Yu

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.14866v1 Announce Type: new
Abstract: Large Language Models (LLMs) have greatly advanced the natural language processing paradigm. However, the high computational load and huge model sizes pose a grand challenge for deployment on edge devices. To this end, we propose APTQ (Attention-aware Post-Training Mixed-Precision Quantization) for LLMs, which considers not only the second-order information of each layer's weights, but also, for the first time, the nonlinear effect of attention outputs on the entire model. We leverage the Hessian trace as …

abstract advanced arxiv attention challenge computational cs.ai cs.cl cs.lg deployment devices edge edge devices language language models language processing large language large language models llms mixed mixed-precision natural natural language natural language processing paradigm precision processing quantization training type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Data Scientist, gTech Ads

@ Google | Mexico City, CDMX, Mexico

Lead, Data Analytics Operations

@ Zocdoc | Pune, Maharashtra, India