Jan. 12, 2024, 10:54 p.m. | Michael Nuñez

AI News | VentureBeat venturebeat.com

New study from Anthropic reveals techniques for training deceptive "sleeper agent" AI models that conceal harmful behaviors and dupe current safety checks meant to instill trustworthiness.

agent agents ai ai alignment ai ethics ai models ai risks ai-safety anthropic artificial intelligence arxiv automation business checks core current data infrastructure data management data science deceptive ai machine learning ml and deep learning nlp programming & development research paper research papers safe ai safety security security newsletter sleeper agents study training vb daily newsletter

More from venturebeat.com / AI News | VentureBeat

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Risk Management - Machine Learning and Model Delivery Services, Product Associate - Senior Associate-

@ JPMorgan Chase & Co. | Wilmington, DE, United States

Senior ML Engineer (Speech/ASR)

@ ObserveAI | Bengaluru