[R] LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale - Facebook AI 2022 - Inference in LLMs with up to 175B parameters without performance degradation and making it possible to use these models on a single server with consumer GPUs! | allainews.com

Aug. 18, 2022, 5:28 p.m. | /u/Singularian2501

Machine Learning www.reddit.com

Paper: [https://arxiv.org/abs/2208.07339](https://arxiv.org/abs/2208.07339)

Github: [https://github.com/timdettmers/bitsandbytes](https://github.com/timdettmers/bitsandbytes)

Software Blogpost: [https://huggingface.co/blog/hf-bitsandbytes-integration](https://huggingface.co/blog/hf-bitsandbytes-integration)

Emergent Features Blogpost: [https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/](https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/)

Abstract:

>Large language models have been widely adopted but require significant GPU memory for inference. We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers, which cut the memory needed for inference by half while retaining full precision performance. With our method, a 175B parameter 16/32-bit checkpoint can be loaded, converted to Int8, and used immediately without performance degradation. This is made possible …

ai consumer facebook facebook ai gpus inference llm llms machinelearning making performance scale server transformers

More from www.reddit.com / Machine Learning

[D] Why transformers are not trained layer-wise? 2 hours ago | www.reddit.com

block example gradient layer +7

[D] Is there an equivalent BigDL project for NVIDIA GPUs, which allows distributing work loads … 6 hours ago | www.reddit.com

cluster gpus library machinelearning +3

[D] What is the best TTS model for my case? 8 hours ago | www.reddit.com

case generate machinelearning question +5

[D] tutorial on how to build streaming ML applications 18 hours ago | www.reddit.com

machinelearning

[D] Why is R^2 so crazy? 18 hours ago | www.reddit.com

baseball games good labels +5

[D] Preserving spatial distribution of data during data splitting 23 hours ago | www.reddit.com

data dataset distribution machinelearning +6

[N] Snowflake releases open (Apache 2.0) 128x3B MoE model 23 hours ago | www.reddit.com

apache apache 2.0 machinelearning moe +2

[D] Why would such a simple sentence break an LLM? 1 day ago | www.reddit.com

copilot disadvantages german gpt4 +7

[R] Speaker diarization 1 day, 1 hour ago | www.reddit.com

api assemblyai aws box +12

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Staff Software Engineer, Generative AI, Google Cloud AI

@ Google | Mountain View, CA, USA; Sunnyvale, CA, USA

View on ai-jobs.net

Expert Data Sciences

@ Gainwell Technologies | Any city, CO, US, 99999

View on ai-jobs.net