Aug. 24, 2022, 11:57 p.m. | /u/ai-lover

machinelearningnews www.reddit.com

Large pretrained language models are frequently used in NLP, although inference requires substantial memory. The feed-forward and attention projection layers, along with associated matrix multiplication operations, are in charge of 95% of the consumed parameters and 65-85% of the total computation for large transformer language models at and beyond 6.7B parameters. Utilizing low-bit-precision matrix multiplication and quantizing the parameters to utilize fewer bits is one method of reducing their size. 8-bit quantization techniques for transformers have been created with this …

ai facebook facebook ai inference language language models large language models llm llms machinelearningnews performance researchers tool

More from www.reddit.com / machinelearningnews

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analytics & Insight Specialist, Customer Success

@ Fortinet | Ottawa, ON, Canada

Account Director, ChatGPT Enterprise - Majors

@ OpenAI | Remote - Paris