This is part of the MIRI Single Author Series. Pieces in this series represent the beliefs and opinions of their named authors, and do not claim to speak for all of MIRI. Several promising software engineers have asked me: Should I work at a frontier AI lab? My answer is always “No.” This post explores the […]| Machine Intelligence Research Institute
A write-up of an incomplete project I worked on at Anthropic in early 2022, using gradient-based approximation to make activation patching far more scalable| Neel Nanda
We’re experimenting with publishing more of our internal thoughts publicly. This piece may be less polished than our normal blog articles. Running AI Safety Fundamentals’ AI alignment and AI governance courses, we often have difficulty finding resources that hit our learning objectives well. Where we can find resources, often they’re not focused on what we want, or are hard for […]| BlueDot Impact