Interesting research: “Guillotine: Hypervisors for Isolating Malicious AIs.” Abstract:As AI models become more embedded in critical sectors like finance, healthcare, and the military, their inscrutable behavior poses ever-greater risks to society. To mitigate this risk, we propose Guillotine, a hypervisor architecture for sandboxing powerful AI models—models that, by accident or malice, can generate existential threats to humanity. Although Guillotine borrows some well-known virtualizat...| Schneier on Security
Researchers could potentially design the next generation of ML models more quickly by delegating some work to existing models, creating a feedback loop of ever-accelerating progress.| Planned Obsolescence
We're creating incentives for AI systems to make their behavior look as desirable as possible, while intentionally disregarding human intent when that conflicts with maximizing reward.| Planned Obsolescence
AI systems that have a precise understanding of how they’ll be evaluated and what behavior we want them to display will earn more reward than AI systems that don’t.| Planned Obsolescence
Perfect alignment just means that AI systems won’t want to deliberately disregard their designers' intent; it's not enough to ensure AI is good for the world.| Planned Obsolescence