These are notes taken during a call with Itay Yona , an expert in software/hardware reverse engineering (SRE). Itay gave me an excellent distillation of key ideas and mindsets in the field, and we discussed analogies/disanalogies to mechanistic interpretability of neural networks. I’m ge| Neel Nanda
An intro guide to a mechanistic interpretability weekend hackathon| Neel Nanda
Introduction| Neel Nanda
A highly opinionated list of what mechanistic interpretability papers to read when getting into the field| Neel Nanda
We describe an approach to tracing the “step-by-step” computation involved when a model responds to a single prompt.| Transformer Circuits
AI progress may lead to transformative AI systems in the next decade, but we do not yet understand how to make such systems safe and aligned with human values. In response, we are pursuing a variety of research directions aimed at better understanding, evaluating, and aligning AI systems.| www.anthropic.com
Design 1st Blog Last updated : Discover the ten key AI trends impacting the future of product development. In 20 years, AI has shifted from being a tech outlier to a central player in shaping product development’s trajectory. This piece sheds light on the top 10 groundbreaking AI trends redefining our digital age and offers […]| Design 1st
YouTube link| AXRP - the AI X-risk Research Podcast
The Alignment Research Center’s Theory team is starting a new hiring round for researchers with a theoretical background. Please apply here. Update January 2024: we have paused hiring and expect to reopen in the second half of 2024. We are open to expressions of interest but do not plan| Alignment Research Center