We're creating incentives for AI systems to make their behavior look as desirable as possible, while intentionally disregarding human intent when that conflicts with maximizing reward.| Planned Obsolescence
Perfect alignment just means that AI systems won’t want to deliberately disregard their designers' intent; it's not enough to ensure AI is good for the world.| Planned Obsolescence
Some arguments in favor and responses to common objections| aligned.substack.com
A few ways we might get very powerful AI systems to be safe.| Cold Takes
How big a deal could AI misalignment be? About as big as it gets.| Cold Takes