ARC has released a paper on Backdoor defense, learnability and obfuscation in which we study a formal notion of backdoors in ML models. Part of our m…| www.lesswrong.com
An informal description of ARC’s current research approach, follow-up to Eliciting Latent Knowledge| Alignment Research Center