Understanding Ghost Attention in LLaMa 2

Feb. 20, 2024, 3:52 p.m. | Matthew Gunton

Towards Data Science - Medium towardsdatascience.com

This blog post explains the Ghost Attention method of fine-tuning introduced in the LLaMa 2 paper.

DALL-E generated image of a ghost llama

The Problem

Often times, we want the LLM to be given an instruction once and then follow it until told otherwise. Nevertheless, as the below example shows LLMs can quickly forget instructions after a few turns of dialogue.

Figure 9 from the LLaMa 2 paper illustrating how instructions can be ignored after a few turns of dialogue …

artificial intelligence attention blog dall dall-e data science example fine-tuning generated ghost image llama llama 2 llm llms paper shows understanding

Visit resource

More from towardsdatascience.com / Towards Data Science - Medium

Why and When to Use the Generalized Method of Moments 3 hours ago | towardsdatascience.com

data science econometrics estimations method-of-moment +1

Create an A.I. Driven Product with Computer Vision and ChatGPT 6 hours ago | towardsdatascience.com

apps cancer chatgpt computer +16

Deep Dive into LlaMA 3 by Hand ✍️ 11 hours ago | towardsdatascience.com

architecture author deep dive explore +12

On handling precalculated hierarchical data in Power BI 11 hours ago | towardsdatascience.com

case concept data data analysis +11

Turn Llama 3 into an Embedding Model with LLM2Vec 11 hours ago | towardsdatascience.com

data data science embedding embedding-model +7

Cyclical Encoding: An Alternative to One-Hot Encoding for Time Series Features 14 hours ago | towardsdatascience.com

alternative data data science encoding +11

Courage to Learn ML: Tackling Vanishing and Exploding Gradients (Part 2) 14 hours ago | towardsdatascience.com

applications courage-to-learn-ml data data science +10

Demystifying Shiny Modules by Transforming a Bigfoot Sightings App Modular 14 hours ago | towardsdatascience.com

app applications build dashboard +10

Modeling Slowly Changing Dimensions 14 hours ago | towardsdatascience.com

data data engineering data science deep dive +8

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Data Engineer - Takealot Group (Takealot.com | Superbalist.com | Mr D Food)

@ takealot.com | Cape Town

View on ai-jobs.net

View more jobs

all AI news

Understanding Ghost Attention in LLaMa 2

This blog post explains the Ghost Attention method of fine-tuning introduced in the LLaMa 2 paper.

The Problem

More from towardsdatascience.com / Towards Data Science - Medium

Jobs in AI, ML, Big Data

AI Engineer Intern, Agents

AI Research Scientist

Data Architect

Data ETL Engineer

Lead GNSS Data Scientist

Data Engineer - Takealot Group (Takealot.com | Superbalist.com | Mr D Food)