Note: If you’ll forgive the shameless self-promotion, applications for my MATS stream are open until Sept 12. I help people write a mech interp paper…| www.alignmentforum.org
We describe an approach to tracing the “step-by-step” computation involved when a model responds to a single prompt.| Transformer Circuits
We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model — in a variety of contexts, using our circuit tracing methodology.| Transformer Circuits
About how and why the habit of actually thinking about something for 5 minutes is an incredibly powerful tool for solving problems and being more creative| Neel Nanda
Mechanistic interpretability seeks to understand neural networks by breaking them into components that are more easily understood than the whole. By understanding the function of each component, and how they interact, we hope to be able to reason about the behavior of the entire network. The first step in that program is to identify the correct components to analyze. | transformer-circuits.pub