Most AI safety conversations centre on alignment: ensuring AI systems share our values and goals. But despite progress, we’re unlikely to know we’ve solved the problem before the arrival of human-level and superhuman systems in as little as three years.| 80,000 Hours
A paper from Anthropic's Alignment Science team on Alignment Faking in AI large language models| www.anthropic.com