s
April 23, 2024, 3:36 a.m. |

Simon Willison's Weblog simonwillison.net

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions


By far the most detailed paper on prompt injection I've seen yet from OpenAI, published a few days ago and with six credited authors: Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Johannes Heidecke and Alex Beutel.


The paper notes that prompt injection mitigations which completely refuse any form of instruction in an untrusted prompt may not actually be ideal: some forms of instruction are harmless, and refusing them may provide …

ai alex authors eric generativeai kai lilian weng llms notes openai paper prompt prompt injection promptinjection security six training training llms

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Data Engineer (m/f/d)

@ Project A Ventures | Berlin, Germany

Principle Research Scientist

@ Analog Devices | US, MA, Boston