Figure 1 Placeholder. Notes on how to implement alignment in AI systems. This is necessarily a fuzzy concept, because Alignment is fuzzy and AI is fuzzy. We need to make peace with the frustrations of this fuzziness and move on. 1 Fine tuning to do nice stuff Think RLHF, Constitutional AI etc. I’m not greatly persuaded that these are the right way to go, but they are interesting. 2 Classifying models as unaligned I’m familiar only with mechanistic interpretability at the moment; I’m su...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Notes on AI Alignment Fast-Track - Losing control to AI 1 Session 1 What is AI alignment? – BlueDot Impact More Is Different for AI Paul Christiano, What failure looks like 👈 my favourite. Cannot believe I hadn’t read this. AI Could Defeat All Of Us Combined Why AI alignment could be hard with modern deep learning Terminology I should have already known but didn’t: Convergent Instrumental Goals. Self-Preservation Goal Preservation Resource Acquisition Self-Improvement Ajeya Cotra’s...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Wherein the internal structure of foundation models is examined and it is observed that embeddings from different models are mappable by structure alone, and linear alignment to human neural activity is noted.| The Dan MacKinlay stable of variably-well-consider’d enterprises