Mechanistic interpretability seeks to reverse engineer neural networks, similar to how one might reverse engineer a compiled binary computer program. After all, neural network parameters are in some sense a binary computer program which runs on one of the exotic virtual machines we call a neural network architecture.| www.transformer-circuits.pub
PASTA: Process for Automating Scientific and Technological Advancement.| Cold Takes
Why would we program AI that wants to harm us? Because we might not know how to do otherwise.| Cold Takes
Today's AI development methods risk training AIs to be deceptive, manipulative and ambitious. This might not be easy to fix as it comes up.| Cold Takes
The "most important century" series of blog posts argues that the 21st century could be the most important century ever for humanity, via the development of advanced AI systems that could dramatically speed up scientific and technological advancement, getting us more quickly than most people imagine to a deeply unfamiliar| Cold Takes
How big a deal could AI misalignment be? About as big as it gets.| Cold Takes