Building towards Coherent Extrapolated Volition with language models| aligned.substack.com
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.| www.anthropic.com
Introducing Claude 3.5 Sonnet—our most intelligent model yet. Sonnet now outperforms competitor models and Claude 3 Opus on key evaluations, at twice the speed.| www.anthropic.com
We need to measure whether LLMs could “steal” themselves| aligned.substack.com