AI safety alignment can make language models more deceptive, says Anthropic study

Jan. 13, 2024, 11:15 a.m. | Matthias Bastian

Recent research by AI startup Anthropic tested whether it is possible to prevent backdoored AI language models from behaving maliciously. If anything, the opposite seems to be the case.

The article AI safety alignment can make language models more deceptive, says Anthropic study appeared first on THE DECODER.

ai in practice ai language models alignment anthropic article artificial intelligence case decoder language language models research safety startup study the decoder

Visit resource

More from the-decoder.com / THE DECODER

Microsoft invested in OpenAI over fears of Google's AI dominance 13 hours ago | the-decoder.com

ai in practice antitrust article artificial intelligence +12

The future of AI language models may lie in predicting beyond the next word, study … 16 hours ago | the-decoder.com

ai language models ai research article artificial intelligence +21

Microsoft invests in humanoid robots with start-up Sanctuary AI 19 hours ago | the-decoder.com

ai and robotics ai research article artificial intelligence +8

Experts call for swift action against autonomous weapons in "Oppenheimer moment" 20 hours ago | the-decoder.com

ai and safety ai and society ai and warfare article +23

OpenAI CEO Sam Altman says GPT-4 is the dumbest AI model you'll ever have to … 20 hours ago | the-decoder.com

ai in practice ai model altman article +14

Anthropic's AI assistant Claude gets an iOS app and new team plan for businesses 1 day, 15 hours ago | the-decoder.com

ai assistant ai in practice anthropic app +13

Nvidia's free local chatbot adds new AI models, image search, and voice input 1 day, 16 hours ago | the-decoder.com

ai in practice ai models application article +17

Microsoft and Axel Springer plan ad-funded AI chatbots for news 1 day, 19 hours ago | the-decoder.com

advertising ai and media ai chatbots ai in practice +19

Reddit users compile list of words and phrases that unmask ChatGPT's writing style 1 day, 20 hours ago | the-decoder.com

ai in practice article artificial intelligence become +16

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

AI Engineering Manager

@ M47 Labs | Barcelona, Catalunya [Cataluña], Spain

View on ai-jobs.net

View more jobs

all AI news

AI safety alignment can make language models more deceptive, says Anthropic study

More from the-decoder.com / THE DECODER

Jobs in AI, ML, Big Data

AI Research Scientist

Data Architect

Data ETL Engineer

Lead GNSS Data Scientist

Senior Machine Learning Engineer (MLOps)

AI Engineering Manager