Nov. 9, 2023, 4:24 p.m. | Aneesh Tickoo

MarkTechPost www.marktechpost.com

Large language models have shown previously unheard-of proficiency in language creation and comprehension, paving the way for advances in logic, mathematics, physics, and other fields. But LLM training is quite expensive. To train a 540B model, for instance, PaLM needs 6,144 TPUv4 chips, whereas GPT-3 175B needs several thousand petaflop/s-days of computation for pre-training. This […]


The post Microsoft Researchers Unveil FP8 Mixed-Precision Training Framework: Supercharging Large Language Model Training Efficiency appeared first on MarkTechPost.

advances applications artificial intelligence chips deep learning editors pick efficiency fields framework gpt gpt-3 instance language language model language models large language large language model large language models llm logic machine learning mathematics microsoft mixed mixed-precision palm physics precision researchers staff technology train training

More from www.marktechpost.com / MarkTechPost

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Consultant Senior Power BI & Azure - CDI - H/F

@ Talan | Lyon, France