April 3, 2024, 2:16 p.m. | Alex Hern UK technology editor

Artificial intelligence (AI) | The Guardian www.theguardian.com

Paper by Anthropic outlines how LLMs can be forced to generate responses to potentially harmful requests

The safety features on some of the most powerful AI tools that stop them being used for cybercrime or terrorism can be bypassed simply by flooding them with examples of wrongdoing, research has shown.

In a paper from the AI lab Anthropic, which produces the large language model (LLM) behind the ChatGPT rival Claude, researchers described an attack they called “many-shot jailbreaking”. The …

ai tools anthropic artificial intelligence (ai) business computing cybercrime examples features flooding generate jailbreak lab llms outlines paper research responses safety technology terrorism them tools

More from www.theguardian.com / Artificial intelligence (AI) | The Guardian

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York