New paper walkthrough: In-Context Learning and Induction Heads. This is the second paper in Anthropic's Transformer Circuits thread, a series of papers trying to reverse engineer transformer language models. I read through it with Charles Frye (from Full-Stack Deep Learning), and we discuss the paper, and give takes and intuitions. See the original paper and a Twitter thread of my paper takeaways