Multi-Query Attention Explained

Nov. 17, 2023, 4:02 p.m. | florian

Multi-Query Attention (MQA) is a type of attention mechanism that can accelerate the speed of generating tokens in the decoder while ensuring model performance.

It is widely used in the era of large language models, many LLMs adopt MQA, such as Falcon, PaLM, StarCoder, and others.

Multi-Head Attention(MHA)

Before introducing MQA, let’s first review the default attention mechanism of the transformer.

Multihead Attention is the default attention mechanism of the transformer model, as shown in Figure 1: …

attention-mechanism deep learning gpt large language models transformers

Visit resource

More from pub.towardsai.net / Towards AI - Medium

Few Shot NLP Intent Classification 21 hours ago | pub.towardsai.net

artificial intelligence chatbot classification data science +9

Inside AlphaFold 3: A Technical View Into the New Version of Google DeepMind’s BioScience Model 23 hours ago | pub.towardsai.net

artificial intelligence generative-ai llm machine learning +1

Kubernetes 101: Grasping the Fundamentals ☸️ 23 hours ago | pub.towardsai.net

data science devops docker kubernetes +1

Llama 3 + Llama.cpp is the local AI Heaven 23 hours ago | pub.towardsai.net

artificial intelligence build cpp data science +8

How to Optimize Chunk Sizes for RAG in Production? 23 hours ago | pub.towardsai.net

artificial intelligence business case chunk +7

How do AI supercomputers train large Gen AI models? Simply Explained 23 hours ago | pub.towardsai.net

ai ai infrastructure ai models chatgpt +10

Design a Multi-Layer Perceptron (MLP) Neural Network for Classification 23 hours ago | pub.towardsai.net

artificial intelligence bank build classification +17

Building LLM Agents Using LangChain & OpenAI API 1 day, 23 hours ago | pub.towardsai.net

agents api building databases +17

Revolutionizing Autonomy: CNNs in Self-Driving Cars 2 days, 1 hour ago | pub.towardsai.net

ai architecture article autonomy +22

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

all AI news

Multi-Query Attention Explained

Multi-Head Attention(MHA)

More from pub.towardsai.net / Towards AI - Medium

Jobs in AI, ML, Big Data

Data Engineer

Artificial Intelligence – Bioinformatic Expert

Lead Developer (AI)

Research Engineer

Ecosystem Manager

Founding AI Engineer, Agents