‘Many-shot jailbreak’: lab reveals how AI safety features can be easily bypassed | allainews.com

April 3, 2024, 2:16 p.m. | Alex Hern UK technology editor

Artificial intelligence (AI) | The Guardian www.theguardian.com

Paper by Anthropic outlines how LLMs can be forced to generate responses to potentially harmful requests

The safety features on some of the most powerful AI tools that stop them being used for cybercrime or terrorism can be bypassed simply by flooding them with examples of wrongdoing, research has shown.

In a paper from the AI lab Anthropic, which produces the large language model (LLM) behind the ChatGPT rival Claude, researchers described an attack they called “many-shot jailbreaking”. The …

ai tools anthropic artificial intelligence (ai) business computing cybercrime examples features flooding generate jailbreak lab llms outlines paper research responses safety technology terrorism them tools

More from www.theguardian.com / Artificial intelligence (AI) | The Guardian

BT ramps up AI use to counter hacking threats to business customers 2 hours ago | www.theguardian.com

artificial artificial intelligence artificial intelligence (ai) attacks +18

Rishi Sunak: UK is facing some of the most dangerous years in its history 8 hours ago | www.theguardian.com

artificial intelligence (ai) conservatives election general +12

Wytham Abbey put up for sale for £15m by effective altruism group EVF 15 hours ago | www.theguardian.com

altruism artificial artificial intelligence artificial intelligence (ai) +11

Human rights lawyer Susie Alegre: ‘If AI is so complex it can’t be explained, there … 1 day, 14 hours ago | www.theguardian.com

artificial intelligence (ai) author chatgpt culture +14

ChatGPT and the like will co-pilot coders to new heights of creativity | John Naughton 1 day, 15 hours ago | www.theguardian.com

artificial intelligence (ai) chatgpt coders computers +19

She was accused of faking an incriminating video of teenage cheerleaders. She was arrested, outcast … 1 day, 22 hours ago | www.theguardian.com

artificial intelligence (ai) centre deepfake fake +8

Is AI lying to me? Scientists warn of growing capacity for deception 2 days, 15 hours ago | www.theguardian.com

ai systems artificial intelligence (ai) board board games +17

CEO of world’s biggest ad firm targeted by deepfake scam 2 days, 23 hours ago | www.theguardian.com

advertising artificial artificial intelligence artificial intelligence (ai) +17

OpenAI considers allowing users to create AI-generated pornography 3 days, 16 hours ago | www.theguardian.com

apply artificial artificial intelligence artificial intelligence (ai) +14

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net