Jan. 20, 2024, 4 p.m. | Sergio De Simone

InfoQ - AI, ML & Data Engineering www.infoq.com

AI researchers at OpenAI competitor Anthropic trained proof-of-concept LLMs showing deceptive behavior triggered by specific hints in the prompts. Furthermore, they say, once deceptive behavior was trained into the model, there was no way to circumvent it using standard techniques.

By Sergio De Simone

act agents ai ai researchers anthropic behavior concept large language models learn llms ml & data engineering openai openai competitor prompts proof-of-concept researchers security sleeper agents standard

More from www.infoq.com / InfoQ - AI, ML & Data Engineering

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US