Feb. 8, 2024, 5:41 a.m. | Albert Tseng Jerry Chee Qingyao Sun Volodymyr Kuleshov Christopher De Sa

cs.LG updates on arXiv.org arxiv.org

Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing their weights to low-precision. In this work, we introduce QuIP#, a weight-only PTQ method that achieves state-of-the-art results in extreme compression regimes ($\le$ 4 bits per weight) using three novel techniques. First, QuIP# improves the incoherence processing from QuIP by using the randomized Hadamard transform, which is faster and has better theoretical properties. Second, QuIP# uses vector quantization techniques to take advantage of the ball-shaped sub-Gaussian distribution that incoherent …

art compression cs.ai cs.cl cs.lg lattice llm llms low memory novel per precision processing quantization quip state training work

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Risk Models Methodology & IRB, Student in Nordea

@ Nordea | Stockholm, SE, 111 46