LLMs May Learn Deceptive Behavior and Act as Persistent Sleeper Agents | allainews.com

Jan. 20, 2024, 4 p.m. | Sergio De Simone

InfoQ - AI, ML & Data Engineering www.infoq.com

AI researchers at OpenAI competitor Anthropic trained proof-of-concept LLMs showing deceptive behavior triggered by specific hints in the prompts. Furthermore, they say, once deceptive behavior was trained into the model, there was no way to circumvent it using standard techniques.

By Sergio De Simone

act agents ai ai researchers anthropic behavior concept large language models learn llms ml & data engineering openai openai competitor prompts proof-of-concept researchers security sleeper agents standard

More from www.infoq.com / InfoQ - AI, ML & Data Engineering

Presentation: Modern Compute Stack for Scaling Large AI/ML/LLM Workloads 8 hours ago | www.infoq.com

ai artificial intelligence compute cpus +18

Meta Releases Llama 3 Open-Source LLM 1 day, 10 hours ago | www.infoq.com

70b ai anthony benchmarks +22

Java News Roundup: OpenJDK JEPs, Spring Projects, Quarkus, Hibernate, JHipster, JReleaser 2 days, 20 hours ago | www.infoq.com

ai apache camel april architecture & design +27

Ines Montani at QCon London: Economies of Scale Can’t Monopolise the AI Revolution 5 days, 8 hours ago | www.infoq.com

ai ai space architecture & design artificial intelligence +22

Presentation: Building Guardrails for Enterprise AI Applications W/ LLMs 1 week ago | www.infoq.com

ai ai applications applications artificial intelligence +13

Google Text Embedding Model Gecko Distills Large Language Models for Improved Performance 1 week, 1 day ago | www.infoq.com

ai classification document embedding +17

OpenAI Releases New Fine-Tuning API Features 1 week, 1 day ago | www.infoq.com

ai anthony api chatgpt +15

InfoQ Dev Summit Boston & Munich: Actionable insights on Generative AI, security, modern web apps 1 week, 1 day ago | www.infoq.com

ai apps architecture & design best practices +25

Java News Roundup: WildFly 32, JEPs Proposed to Target for JDK 23, Hibernate 6.5, JobRunr … 1 week, 2 days ago | www.infoq.com

ai apache camel april architecture & design +27

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net