[R] QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models - Institute of Science and Technology Austria (ISTA) 2023 - Can compress the 1.6 trillion parameter SwitchTransformer-c2048 model to less than 160GB (20x compression, 0.8 bits per | allainews.com

Oct. 26, 2023, 7:01 p.m. | /u/Singularian2501

Machine Learning www.reddit.com

Paper: [https://arxiv.org/abs/2310.16795](https://arxiv.org/abs/2310.16795)

Github: [https://github.com/ist-daslab/qmoe](https://github.com/ist-daslab/qmoe)

Abstract:

>Mixture-of-Experts (MoE) architectures offer a general solution to the high inference costs of large language models (LLMs) via sparse routing, bringing faster and more accurate models, at the cost of massive parameter counts. For example, the SwitchTransformer-c2048 model has 1.6 trillion parameters, requiring 3.2TB of accelerator memory to run efficiently, which makes practical deployment challenging and expensive. In this paper, we present a solution to this memory problem, in form of a new compression and …

abstract accelerator architectures cost costs deployment example experts faster general inference inference costs language language models large language large language models llms machinelearning massive memory moe parameters practical routing solution

More from www.reddit.com / Machine Learning

[Project] Tabletop HandyBot: low-cost robotic arm assistant for tabletop tasks 10 hours ago | www.reddit.com

arm assistant cost functional +9

[R] Grounding DINO 1.5 Release: the most capable open-set detection model 11 hours ago | www.reddit.com

building dataset detection foundation +12

[D] Foundational Time Series Models Overrated? 11 hours ago | www.reddit.com

chronos domain etc example +13

[project] YOLOv8 quantized in INT8 11 hours ago | www.reddit.com

fps github jetson jetson orin +5

[R] Do Llamas Work in English? On the Latent Language of Multilingual Transformers 12 hours ago | www.reddit.com

abstract bias colab english +19

[R] Robust agents learn causal world models 12 hours ago | www.reddit.com

abstract agent agents biases +14

[D] Library for named entity recognition 12 hours ago | www.reddit.com

library machinelearning mean recognition +3

[N] ICML 2024 Workshop on making discrete operations differentiable 🤖 13 hours ago | www.reddit.com

clustering deep learning differentiable everything +12

[P] GPT-Burn: A simple & concise implementation of the GPT in pure Rust 🔥 18 hours ago | www.reddit.com

gpt implementation machinelearning rust +1

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net