Jan. 16, 2024, 9:30 p.m. | Thomas Claburn

The Register - Software: AI + ML www.theregister.com

Today's safety guardrails won't catch these backdoors, study warns

Analysis  AI biz Anthropic has published research showing that large language models (LLMs) can be subverted in a way that safety training doesn't currently address.…

agent ai assistants analysis anthropic assistants code guardrails language language models large language large language models llms research safety study training

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote