all AI news
LLMs May Learn Deceptive Behavior and Act as Persistent Sleeper Agents
Jan. 20, 2024, 4 p.m. | Sergio De Simone
InfoQ - AI, ML & Data Engineering www.infoq.com
AI researchers at OpenAI competitor Anthropic trained proof-of-concept LLMs showing deceptive behavior triggered by specific hints in the prompts. Furthermore, they say, once deceptive behavior was trained into the model, there was no way to circumvent it using standard techniques.
By Sergio De Simoneact agents ai ai researchers anthropic behavior concept large language models learn llms ml & data engineering openai openai competitor prompts proof-of-concept researchers security sleeper agents standard
More from www.infoq.com / InfoQ - AI, ML & Data Engineering
Meta Releases Llama 3 Open-Source LLM
1 day, 10 hours ago |
www.infoq.com
OpenAI Releases New Fine-Tuning API Features
1 week, 1 day ago |
www.infoq.com
Jobs in AI, ML, Big Data
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote