Microsoft Improves Transformer Stability to Successfully Scale Extremely Deep Models to 1000 Layers

March 3, 2022, 5:51 p.m. | Synced

A Microsoft research team proposes DeepNorm, a novel normalization function that improves the stability of transformers to enable scaling that is an order of magnitude deeper (more than 1,000 layers) than previous deep transformers.

The post Microsoft Improves Transformer Stability to Successfully Scale Extremely Deep Models to 1000 Layers first appeared on Synced.

ai artificial intelligence deep-neural-networks machine learning machine learning & data science microsoft ml research scale technology transformer transformers

Visit resource

More from syncedreview.com / Synced

Decoding Code Execution: How DeepMind’s NExT Empowers AI Reasoning 1 day, 17 hours ago | syncedreview.com

ai ai reasoning artificial intelligence code +29

NVIDIA’s ScaleFold Slashes AlphaFold’s Training Time to 10 Hours 3 days, 16 hours ago | syncedreview.com

ai alphafold artificial intelligence benchmark +17

DeepMind’s RecurrentGemma Pioneering Efficiency for Open Small Language Models 6 days, 4 hours ago | syncedreview.com

ai architecture artificial intelligence deepmind +23

87% ImageNet Accuracy, 3.8ms Latency: Google’s MobileNetV4 Redefines On-Device Mobile Vision 1 week ago | syncedreview.com

accuracy ai artificial intelligence computer vision +21

Unveiling the Black Box: Meta’s LM Transparency Tool Deciphers Transformer Language Models 1 week, 2 days ago | syncedreview.com

ai artificial intelligence black box box +24

OPPO AI’s Transformer-Lite Delivers 10x+ Prefill and 2~3x Decoding Boost on Mobile Phone GPUs 1 week, 3 days ago | syncedreview.com

ai artificial intelligence boost center +24

Revolutionizing Video Understanding: Real-Time Captioning for Any Length with Google’s Streaming Model 2 weeks, 1 day ago | syncedreview.com

advancement ai artificial intelligence captioning +21

AURORA-M: A Global Symphony of Innovation as 33 Prestigious Institutions Unify for Open-Source Multilingual Mastery 2 weeks, 3 days ago | syncedreview.com

accessibility ai ai development artificial intelligence +21

Huawei & Peking U’s DiJiang: A Transformer Achieving LLaMA2-7B Performance at 1/50th the Training Cost 3 weeks, 1 day ago | syncedreview.com

ai artificial intelligence attention mechanisms benchmarks +21

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Data Analytics & Insight Specialist, Customer Success

@ Fortinet | Ottawa, ON, Canada

View on ai-jobs.net

Account Director, ChatGPT Enterprise - Majors

@ OpenAI | Remote - Paris

View on ai-jobs.net

View more jobs

all AI news

Microsoft Improves Transformer Stability to Successfully Scale Extremely Deep Models to 1000 Layers

More from syncedreview.com / Synced

Jobs in AI, ML, Big Data

Data Architect

Data ETL Engineer

Lead GNSS Data Scientist

Senior Machine Learning Engineer (MLOps)

Data Analytics & Insight Specialist, Customer Success

Account Director, ChatGPT Enterprise - Majors