Feb. 20, 2024, 3:52 p.m. | Matthew Gunton

Towards Data Science - Medium towardsdatascience.com

This blog post explains the Ghost Attention method of fine-tuning introduced in the LLaMa 2 paper.

DALL-E generated image of a ghost llama

The Problem

Often times, we want the LLM to be given an instruction once and then follow it until told otherwise. Nevertheless, as the below example shows LLMs can quickly forget instructions after a few turns of dialogue.

Figure 9 from the LLaMa 2 paper illustrating how instructions can be ignored after a few turns of dialogue …

artificial intelligence attention blog dall dall-e data science example fine-tuning generated ghost image llama llama 2 llm llms paper shows understanding

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Data Engineer - Takealot Group (Takealot.com | Superbalist.com | Mr D Food)

@ takealot.com | Cape Town