Roast topics
Find topics
Find it!
Self-exfiltration is a key dangerous capability
We need to measure whether LLMs could “steal” themselves
| aligned.substack.com