Anthropic researchers detail "many-shot jailbreaking", which can evade LLMs' safety guardrails by including a large number of faux dialogues in a single prompt (Devin Coldewey/TechCrunch)

April 2, 2024, 11:05 p.m. |

Devin Coldewey / TechCrunch:

Anthropic researchers detail “many-shot jailbreaking”, which can evade LLMs' safety guardrails by including a large number of faux dialogues in a single prompt — How do you get an AI to answer a question it's not supposed to? There are many such “jailbreak” techniques …

anthropic devin guardrails jailbreaking llms prompt researchers safety techcrunch

Visit resource

More from www.techmeme.com / Techmeme

Amazon launches Bedrock Studio in public preview, a web tool to help orgs experiment with … an hour ago | www.techmeme.com

ai models ai-powered amazon apps +11

Google unveils Pixel 8a with the Tensor G3 chip, a 6.1" 120Hz OLED with up … 4 hours ago | www.techmeme.com

access chip gemini gemini nano +7

OpenAI says it's developing a Media Manager tool, slated for release by 2025, to let … 4 hours ago | www.techmeme.com

control identify kyle manager +5

UK and US authorities identify and charge the leader of the LockBit ransomware gang, a … 5 hours ago | www.techmeme.com

identify identity leader ransomware +2

Apple announces M4, a new chip focused on improving AI-related tasks, featuring up to 4x … 5 hours ago | www.techmeme.com

apple chip cpu faster +5

Microsoft deploys a generative AI model entirely divorced from the internet, saying US intel agencies … 7 hours ago | www.techmeme.com

ai model analyze bloomberg generative +9

Akamai agrees to acquire Noname Security, which finds and fixes API vulnerabilities, for ~$450M, closing … 8 hours ago | www.techmeme.com

akamai api noname security security +2

OpenAI releases a tool to detect DALL-E 3-created images, claiming 98% accuracy for unaltered images, … 9 hours ago | www.techmeme.com

accuracy adobe content credentials dall +13

MITRE, a federally funded, not-for-profit US research organization, plans to build a $20M supercomputer with … 9 hours ago | www.techmeme.com

ai tools build federal government for-profit +9

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

View more jobs

all AI news

Anthropic researchers detail "many-shot jailbreaking", which can evade LLMs' safety guardrails by including a large number of faux dialogues in a single prompt (Devin Coldewey/TechCrunch)

More from www.techmeme.com / Techmeme

Jobs in AI, ML, Big Data

Lead Developer (AI)

Research Engineer

Ecosystem Manager

Founding AI Engineer, Agents

AI Engineer Intern, Agents

AI Research Scientist