NeurIPS 2023| trojandetection.ai
An informal description of ARC’s current research approach, follow-up to Eliciting Latent Knowledge| Alignment Research Center