Login
From:
Neel Nanda
(Uncensored)
subscribe
Attribution Patching: Activation Patching At Industrial Scale — Neel Nanda
https://www.neelnanda.io/mechanistic-interpretability/attribution-patching
links
backlinks
Roast topics
Find topics
Find it!
A write-up of an incomplete project I worked on at Anthropic in early 2022, using gradient-based approximation to make activation patching far more scalable