1. Turing machines| plato.stanford.edu
Neural network that learns efficient data encoding in an unsupervised manner| en.wikipedia.org
It's useful to try to peek inside the black box, but it should be done rigorously.| seantrott.substack.com
Trying to peek inside the "black box".| seantrott.substack.com
When we turn up the strength of the “Golden Gate Bridge” feature, Claude’s responses begin to focus on the Golden Gate Bridge. For a short time, we’re making this model available for everyone to interact with.| www.anthropic.com
Modern language models predict "tokens", not words—but what exactly are tokens?| seantrott.substack.com
When we use LLMs as "model organisms", which humans are we modeling? And how can we overcome the problem of unrepresentative data?| seantrott.substack.com
Mechanistic interpretability seeks to understand neural networks by breaking them into components that are more easily understood than the whole. By understanding the function of each component, and how they interact, we hope to be able to reason about the behavior of the entire network. The first step in that program is to identify the correct components to analyze. | transformer-circuits.pub
Want to really understand how large language models work? Here’s a gentle primer.| www.understandingai.org