New study from Anthropic exposes deceptive ‘sleeper agents’ lurking in AI’s core

Jan. 12, 2024, 10:54 p.m. | Michael Nuñez

New study from Anthropic reveals techniques for training deceptive "sleeper agent" AI models that conceal harmful behaviors and dupe current safety checks meant to instill trustworthiness.

agent agents ai ai alignment ai ethics ai models ai risks ai-safety anthropic artificial intelligence arxiv automation business checks core current data infrastructure data management data science deceptive ai machine learning ml and deep learning nlp programming & development research paper research papers safe ai safety security security newsletter sleeper agents study training vb daily newsletter

Visit resource

More from venturebeat.com / AI News | VentureBeat

Sam Altman shoots down rumors of OpenAI search engine 2 days, 5 hours ago | venturebeat.com

ai altman brockman business +23

Perplexity’s latest partnership set to power SoundHound’s voice assistant 2 days, 6 hours ago | venturebeat.com

ai ai search alexa assistant +28

The AI Beat: Why does OpenAI need a search engine? 2 days, 8 hours ago | venturebeat.com

ai ai beat business clear +14

Invoke AI rolls out refined control features for image generation 2 days, 10 hours ago | venturebeat.com

ai ai-image-generation control features +11

Runway’s LA film festival marked an inflection point for AI movies 3 days, 3 hours ago | venturebeat.com

ai ai video ai video creation ai video generation +24

A new video AI generator emerges: Krea AI adds capabilities for paid subscribers 3 days, 3 hours ago | venturebeat.com

ai ai art ai generation ai video +18

The Apple Vision Pro may have tanked — but spatial computing is still the future, … 3 days, 3 hours ago | venturebeat.com

ai apple apple vision pro augmented reality +23

AMD gained share in key processor categories in Q1 | Mercury Research 3 days, 3 hours ago | venturebeat.com

ai amd business client +16

Stability AI sows gen AI discord with Stable Artisan 3 days, 5 hours ago | venturebeat.com

ai ai video generation api arts & entertainment +28

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

View more jobs

all AI news

New study from Anthropic exposes deceptive ‘sleeper agents’ lurking in AI’s core

More from venturebeat.com / AI News | VentureBeat

Jobs in AI, ML, Big Data

Data Engineer

Artificial Intelligence – Bioinformatic Expert

Lead Developer (AI)

Research Engineer

Ecosystem Manager

Founding AI Engineer, Agents