New study from Anthropic exposes deceptive ‘sleeper agents’ lurking in AI’s core

Jan. 12, 2024, 10:54 p.m. | Michael Nuñez

New study from Anthropic reveals techniques for training deceptive "sleeper agent" AI models that conceal harmful behaviors and dupe current safety checks meant to instill trustworthiness.

agent agents ai ai alignment ai ethics ai models ai risks ai-safety anthropic artificial intelligence arxiv automation business checks core current data infrastructure data management data science deceptive ai machine learning ml and deep learning nlp programming & development research paper research papers safe ai safety security security newsletter sleeper agents study training vb daily newsletter

Visit resource

More from venturebeat.com / AI News | VentureBeat

Cohere’s new AI Command-R+ is available on Sagemaker and Azure…but not Bedrock (yet) 3 weeks, 5 days ago | venturebeat.com

ai azure bedrock business +19

Anyscale addresses critical vulnerability on Ray framework — but thousands were still exposed 3 weeks, 5 days ago | venturebeat.com

ai ai workloads anyscale cloud and data storage security +21

Race to the gen AI edge heats up as Dell invests in SiMa.ai 3 weeks, 5 days ago | venturebeat.com

ai ai at the edge ai edge business & industrial +35

Gretel releases world’s largest open source text-to-SQL dataset, empowering businesses to unlock AI’s potential 3 weeks, 5 days ago | venturebeat.com

accessibility ai ai training ai training data +35

Quantum computing startup Infleqtion names Matthew Kinsella as CEO 3 weeks, 5 days ago | venturebeat.com

ai business ceo change +21

Microsoft boosts Azure AI Search with more storage and support for big RAG apps 3 weeks, 5 days ago | venturebeat.com

ai ai azure search more affordable ai search apps +17

With little urging, Grok will detail how to make bombs, concoct drugs (and much, much … 3 weeks, 6 days ago | venturebeat.com

ai ai researchers anthropic bombs +26

DataStax acquires Langflow to accelerate enterprise generative AI app development 3 weeks, 6 days ago | venturebeat.com

adoption ai ai app ai-app-development +31

Salesforce strengthens Mulesoft with AI tools to extract data, automate workflows 3 weeks, 6 days ago | venturebeat.com

ai ai tools anypoint code builder artificial intelligence +40

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Risk Management - Machine Learning and Model Delivery Services, Product Associate - Senior Associate-

@ JPMorgan Chase & Co. | Wilmington, DE, United States

View on ai-jobs.net

Senior ML Engineer (Speech/ASR)

@ ObserveAI | Bengaluru

View on ai-jobs.net

View more jobs

all AI news

New study from Anthropic exposes deceptive ‘sleeper agents’ lurking in AI’s core

More from venturebeat.com / AI News | VentureBeat

Jobs in AI, ML, Big Data

Data Architect

Data ETL Engineer

Lead GNSS Data Scientist

Senior Machine Learning Engineer (MLOps)

Risk Management - Machine Learning and Model Delivery Services, Product Associate - Senior Associate-

Senior ML Engineer (Speech/ASR)