We describe an approach to tracing the “step-by-step” computation involved when a model responds to a single prompt.| Transformer Circuits
We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model — in a variety of contexts, using our circuit tracing methodology.| Transformer Circuits
Today, we’re announcing Claude 3.7 Sonnet, our most intelligent model to date and the first hybrid reasoning model generally available on the market.| www.anthropic.com
A paper from Anthropic describing a new way to guard LLMs against jailbreaking| www.anthropic.com
We have identified how millions of concepts are represented inside Claude Sonnet, one of our deployed large language models. This is the first ever detailed look inside a modern, production-grade large language model.| www.anthropic.com