Jan. 20, 2024, 4 p.m. | Sergio De Simone

InfoQ - AI, ML & Data Engineering www.infoq.com

AI researchers at OpenAI competitor Anthropic trained proof-of-concept LLMs showing deceptive behavior triggered by specific hints in the prompts. Furthermore, they say, once deceptive behavior was trained into the model, there was no way to circumvent it using standard techniques.

By Sergio De Simone

act agents ai ai researchers anthropic behavior concept large language models learn llms ml & data engineering openai openai competitor prompts proof-of-concept researchers security sleeper agents standard

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote