LLMs May Learn Deceptive Behavior and Act as Persistent Sleeper Agents | allainews.com

Jan. 20, 2024, 4 p.m. | Sergio De Simone

InfoQ - AI, ML & Data Engineering www.infoq.com

AI researchers at OpenAI competitor Anthropic trained proof-of-concept LLMs showing deceptive behavior triggered by specific hints in the prompts. Furthermore, they say, once deceptive behavior was trained into the model, there was no way to circumvent it using standard techniques.

By Sergio De Simone

act agents ai ai researchers anthropic behavior concept large language models learn llms ml & data engineering openai openai competitor prompts proof-of-concept researchers security sleeper agents standard

More from www.infoq.com / InfoQ - AI, ML & Data Engineering

Podcast: Deepthi Sigireddi on Distributed Database Architecture in the Cloud Native Era 2 hours ago | www.infoq.com

ai architecture architecture & design cloud +19

Java News Roundup: OpenJDK Updates, Piranha Cloud, Spring Data 2024.0.0, GlassFish, Micrometer 10 hours ago | www.infoq.com

ai apache tomcat architecture & design cloud +27

Google Launches Gemini 1.5 Flash for Lower-Latency and More Efficient AI Serving 20 hours ago | www.infoq.com

ai ai models context context window +15

Uber Migrates 1 Trillion Records from DynamoDB to LedgerStore to Save $6 Million Annually 1 day, 7 hours ago | www.infoq.com

ai architecture & design aws big data +23

Enhanced Security for Enterprises: Google Launches Google Threat Intelligence 1 day, 7 hours ago | www.infoq.com

ai analysis cloud cloud security +24

Rider 2024.1: New Monitoring Tool Window, Collection Vizualizer, .NET Aspire, AI Assistant Plugin 1 day, 17 hours ago | www.infoq.com

ai ai assistant architecture & design artificial intelligence +17

Presentation: Streaming Databases: Embracing the Convergence of Stream Processing and Databases 3 days, 4 hours ago | www.infoq.com

ai convergence database databases +14

Hugging Face Unveils LeRobot, an Open-Source Machine Learning Model for Robotics 4 days, 2 hours ago | www.infoq.com

advanced ai applications daniel +17

Apple Open-Sources One Billion Parameter Language Model OpenELM 6 days ago | www.infoq.com

ai anthony apple attention +12

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net