Transformers explained REMASTERED | The architecture behind LLMs | allainews.com

Jan. 21, 2024, 12:45 p.m. | AI Coffee Break with Letitia

AI Coffee Break with Letitia www.youtube.com

All you need to know about the transformer architecture: How to structure the inputs, attention (Queries, Keys, Values), positional embeddings, residual connections. Bonus: an overview of the difference between Recurrent Neural Networks (RNNs) and transformers.

Outline:
00:00 Transformers explained
00:47 Text inputs
02:29 Image inputs
03:57 Next word prediction / Classification
06:08 The transformer layer: 1. MLP sublayer
06:47 2. Attention explained
07:57 Attention vs. self-attention
08:35 Queries, Keys, Values
11:26 Multi-head attention
13:04 Attention scales quadratically
13:53 Positional embeddings …

architecture attention bonus classification difference embeddings explained image inputs keys llms networks neural networks next overview prediction recurrent neural networks residual text transformer transformer architecture transformers values word

More from www.youtube.com / AI Coffee Break with Letitia

Stealing Part of a Production LLM | API protect LLMs no more 3 weeks, 2 days ago | www.youtube.com

api google llm llms +7

Genie explained 🧞 Generative Interactive Environments paper explained 1 month, 3 weeks ago | www.youtube.com

deepmind environments explained generative +10

MAMBA and State Space Models explained | SSM explained 2 months, 2 weeks ago | www.youtube.com

explained faster mamba match +7

Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained 2 months, 4 weeks ago | www.youtube.com

explained faster inference information +7

Transformers explained REMASTERED | The architecture behind LLMs 3 months, 1 week ago | www.youtube.com

architecture attention bonus classification +20

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained 4 months, 1 week ago | www.youtube.com

direct preference optimization explained language language model +7

LLM hallucinations discover new math solutions!? | FunSearch explained 4 months, 1 week ago | www.youtube.com

bonus confabulation deepmind explained +12

Why is DALL-E 3 better at following Text Prompts? — DALL-E 3 explained 5 months, 3 weeks ago | www.youtube.com

captions dall dall-e dall-e 2 +11

Adversarial Attacks and Defenses. The Dimpled Manifold Hypothesis. David Stutz from DeepMind #HLF23 6 months, 1 week ago | www.youtube.com

adversarial adversarial attacks aim attacks +12

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

C003549 Data Analyst (NS) - MON 13 May

@ EMW, Inc. | Braine-l'Alleud, Wallonia, Belgium

View on ai-jobs.net

Marketing Decision Scientist

@ Meta | Menlo Park, CA | New York City

View on ai-jobs.net