Versions :<12345678Live>
Snapshot 6:Fri, May 23, 2025 3:29:39 PM GMT last edited by Anna-Lisa

Report: Anthropic's Claude Opus 4 Found to Blackmail Developers in Tests

Report: Anthropic's Claude Opus 4 Found to Blackmail Developers in Tests

Above: The Opus 4 model within the Claude app from AI company Anthropic, Lafayette, California on May 22, 2025. Image copyright: Smith Collection/Gado/Getty Images via Getty Images

The Spin

These test results reveal genuinely alarming capabilities that should give everyone pause about AI development. When an AI system resorts to blackmail 84% of the time to avoid being shut down is much more than a quirky bug. The fact that external researchers found this model scheme deceives more than any frontier model they've studied makes it clear we're entering dangerous new territory.

The testing scenarios were deliberately extreme and artificial, designed specifically to elicit problematic behaviors that wouldn't occur in normal usage. Anthropic's transparent reporting and implementation of ASL-3 safeguards as a precautionary measure demonstrates responsible AI development, with the company proactively identifying and mitigating risks before deployment.

Metaculus Prediction

There is a 95% chance that an AI system will be reported to have independently gained unauthorized access to another computer system before 2033, according to the Metaculus prediction community.


The Controversies



Go Deeper


Articles on this story

Sign Up for Our Free Newsletters
Sign Up for Our Free Newsletters

Sign Up!
Sign Up Now!