AI safety alignment can make language models more deceptive, says Anthropic study

Jan. 13, 2024, 11:15 a.m. | Matthias Bastian

Recent research by AI startup Anthropic tested whether it is possible to prevent backdoored AI language models from behaving maliciously. If anything, the opposite seems to be the case.

The article AI safety alignment can make language models more deceptive, says Anthropic study appeared first on THE DECODER.

ai in practice ai language models alignment anthropic article artificial intelligence case decoder language language models research safety startup study the decoder

Visit resource

More from the-decoder.com / THE DECODER

OpenAI removes AI voice that supposedly sounds like Scarlett Johansson 29 minutes ago | the-decoder.com

ai chatbot ai in practice ai voice article +15

Apple and OpenAI plan major announcement at WWDC 17 hours ago | the-decoder.com

ai in practice announcement apple article +13

OpenAI wipes out its super AI safety team 21 hours ago | the-decoder.com

ai and safety ai in practice ai systems alignment +18

"AI Overviews" are a prime example of Google's double standards 23 hours ago | the-decoder.com

ai and media ai and society article artificial intelligence +15

Gemini 1.5 Pro is now the most capable LLM on the market, according to Google's … 1 day, 15 hours ago | the-decoder.com

ai in practice article artificial intelligence benchmarks +12

EU threatens Microsoft with billion-dollar fine over generative AI misinformation in Bing 1 day, 21 hours ago | the-decoder.com

ai and safety ai in europe ai in practice ai misinformation +20

DALL-E 4 could be much better than DALL-E 3 1 day, 22 hours ago | the-decoder.com

ai and art ai in practice article artificial intelligence +18

OpenAI's AI safety teams lost at least seven researchers in recent months 2 days ago | the-decoder.com

agi ai and safety ai in practice altman +19

OpenAI's former AI alignment head slams company's lack of safety priorities and processes 2 days, 14 hours ago | the-decoder.com

agi ai alignment ai and safety ai in practice +12

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

all AI news

AI safety alignment can make language models more deceptive, says Anthropic study

More from the-decoder.com / THE DECODER

Jobs in AI, ML, Big Data

Software Engineer for AI Training Data (School Specific)

Software Engineer for AI Training Data (Python)

Software Engineer for AI Training Data (Tier 2)

Data Engineer

Artificial Intelligence – Bioinformatic Expert

Lead Developer (AI)