all AI news
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Simon Willison's Weblog simonwillison.net
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
By far the most detailed paper on prompt injection I've seen yet from OpenAI, published a few days ago and with six credited authors: Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Johannes Heidecke and Alex Beutel.
The paper notes that prompt injection mitigations which completely refuse any form of instruction in an untrusted prompt may not actually be ideal: some forms of instruction are harmless, and refusing them may provide …
ai alex authors eric generativeai kai lilian weng llms notes openai paper prompt prompt injection promptinjection security six training training llms