We’re experimenting with publishing more of our internal thoughts publicly. This piece may be less polished than our normal blog articles. Running AI Safety Fundamentals’ AI alignment and AI governance courses, we often have difficulty finding resources that hit our learning objectives well. Where we can find resources, often they’re not focused on what we want, or are hard for […]| BlueDot Impact
In this post, we argue that AI labs should ensure that powerful AIs are controlled. That is, labs should make sure that the safety measures they appl…| www.lesswrong.com
In this post, we argue that AI labs should ensure that powerful AIs are controlled. That is, labs should make sure that the safety measures they appl…| www.alignmentforum.org
Even when you try to do good, you can end up doing accidental harm. But there are ways you can minimise the risks.| 80,000 Hours
Week 3 of the AI alignment curriculum. Goal misgeneralization is scenarios in which agents in new situations generalize to behaving in competent yet undesirable ways because of learning the wrong goals from previous training. Goal Misgeneralisation: Why Correct Specifications Aren’t Enough For Correct Goals (Shah, 2022) Blog post A correct specification is needed for the learner to have the right context (so it doesn’t exploit bugs), but doesn’t automatically result in correct goals If ...| ahiru.pl
The following is an edited transcript of a talk I gave. I have given this talk at multiple places, including first at Anthropic and then for ELK winn…| www.alignmentforum.org
On agency - the mindset of being able to look past defaults and constraints, and find ways to take action to achieve your goals. Examining what’s holding you back, understanding what agency feels like, and concrete advice on how to cultivate it.| Neel Nanda
PASTA: Process for Automating Scientific and Technological Advancement.| Cold Takes
Why would we program AI that wants to harm us? Because we might not know how to do otherwise.| Cold Takes
People are far better at their jobs than at anything else. Here are the best ways to help the most important century go well.| Cold Takes
Hypothetical stories where the world tries, but fails, to avert a global disaster.| Cold Takes
A few ways we might get very powerful AI systems to be safe.| Cold Takes
Four analogies for why "We don't see any misbehavior by this AI" isn't enough.| Cold Takes
Today's AI development methods risk training AIs to be deceptive, manipulative and ambitious. This might not be easy to fix as it comes up.| Cold Takes
The "most important century" series of blog posts argues that the 21st century could be the most important century ever for humanity, via the development of advanced AI systems that could dramatically speed up scientific and technological advancement, getting us more quickly than most people imagine to a deeply unfamiliar| Cold Takes