Jan. 19, 2022, 5:19 p.m. | Alyssa Hughes

Microsoft Research www.microsoft.com

In the last three years, the largest trained dense models have increased in size by over 1,000 times, from a few hundred million parameters to over 500 billion parameters in Megatron-Turing NLG 530B (MT-NLG). Improvements in model quality with size suggest that this trend will continue, with larger model sizes bringing better model quality. However, […]


The post DeepSpeed: Advancing MoE inference and training to power next-generation AI scale appeared first on Microsoft Research.

ai moe power research blog training

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Engineer

@ Parker | New York City

Sr. Data Analyst | Home Solutions

@ Three Ships | Raleigh or Charlotte, NC