Figure 1 Placeholder. Notes on how to implement alignment in AI systems. This is necessarily a fuzzy concept, because Alignment is fuzzy and AI is fuzzy. We need to make peace with the frustrations of this fuzziness and move on. 1 Fine tuning to do nice stuff Think RLHF, Constitutional AI etc. I’m not greatly persuaded that these are the right way to go, but they are interesting. 2 Classifying models as unaligned I’m familiar only with mechanistic interpretability at the moment; I’m su...