Unlike everyone else, Anthropic actually Does (Some of) the Research. That means they report all the insane behaviors you can potentially get their models to do, what causes those behaviors, how they addressed this and what we can learn. It is a treasure trove. And then they react reasonably, in this case imposing their ASL-3 safeguards on Opus 4. That’s right, Opus. We are so back.| thezvi.substack.com
Discover Claude 4's breakthrough AI capabilities. Experience more reliable, interpretable assistance for complex tasks across work and learning.| www.anthropic.com