This year I’ve been on sabbatical, and have spent my time upskilling in AI Safety. Part of that is doing independent research projects in different fields. Some of those items have resulted in useful output, notably A Toy Model of the U-AND Problem, Do No Harm? and SAEs and their Variants. And then there are others that I’ve just […]