Eliciting latent knowledge: How to tell if your eyes deceive you Paul Christiano, Ajeya Cotra, and Mark Xu Alignment Research Center December 2021 In this post, we’ll present ARC’s approach to an open problem we think is central to aligning powerful machine learning (ML) systems: Suppose we ...