April 2, 2024, 11:05 p.m. |

Techmeme www.techmeme.com


Devin Coldewey / TechCrunch:

Anthropic researchers detail “many-shot jailbreaking”, which can evade LLMs' safety guardrails by including a large number of faux dialogues in a single prompt  —  How do you get an AI to answer a question it's not supposed to?  There are many such “jailbreak” techniques …

anthropic devin guardrails jailbreaking llms prompt researchers safety techcrunch

More from www.techmeme.com / Techmeme

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote