How can a science of LLMs keep up with technological development?| seantrott.substack.com
A write-up of an incomplete project I worked on at Anthropic in early 2022, using gradient-based approximation to make activation patching far more scalable| Neel Nanda
Three hard questions for a new paradigm.| seantrott.substack.com
Do we need a CERN for LLM-ology?| seantrott.substack.com
How to get started studying LLMs.| seantrott.substack.com
Some of this article's listed sources may not be reliable. Please help improve this article by looking for better, more reliable sources. Unreliable citations may be challenged and removed. (October 2024) (Learn how and when to remove this message)| en.wikipedia.org
Trying to peek inside the "black box".| seantrott.substack.com
Sparse Autoencoders (SAEs) have recently become popular for interpretability of machine learning models (although sparse dictionary learning has been around since 1997). Machine learning models and LLMs are becoming more powerful and useful, but they are still black boxes, and we don’t understand how they do the things that they are capable of. It seems like it would be useful if we could understand how they work.| Adam Karvonen
Measuring what you want to measure is hard.| seantrott.substack.com