all AI news
[R] LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale - Facebook AI 2022 - Inference in LLMs with up to 175B parameters without performance degradation and making it possible to use these models on a single server with consumer GPUs!
Aug. 18, 2022, 5:28 p.m. | /u/Singularian2501
Machine Learning www.reddit.com
Github: [https://github.com/timdettmers/bitsandbytes](https://github.com/timdettmers/bitsandbytes)
Software Blogpost: [https://huggingface.co/blog/hf-bitsandbytes-integration](https://huggingface.co/blog/hf-bitsandbytes-integration)
Emergent Features Blogpost: [https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/](https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/)
Abstract:
>Large language models have been widely adopted but require significant GPU memory for inference. We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers, which cut the memory needed for inference by half while retaining full precision performance. With our method, a 175B parameter 16/32-bit checkpoint can be loaded, converted to Int8, and used immediately without performance degradation. This is made possible …
ai consumer facebook facebook ai gpus inference llm llms machinelearning making performance scale server transformers
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Staff Software Engineer, Generative AI, Google Cloud AI
@ Google | Mountain View, CA, USA; Sunnyvale, CA, USA
Expert Data Sciences
@ Gainwell Technologies | Any city, CO, US, 99999