A highly opinionated list of what mechanistic interpretability papers to read when getting into the field| Neel Nanda
overall direction • people management • project management • technical leadership • example divisions of labor| benkuhn.net
This is the long-form version of a public comment on Anthropic's Towards Monosemanticity paper …| www.alignmentforum.org
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.| www.anthropic.com