Figure 1 Agent foundations is the branch of AI alignment that tries to answer: if we were to build a superintelligent system from scratch, what clean, mathematical objective could we give it so that it robustly does what we want, even if we cannot understand the system ourselves? Unlike interpretability (which inspects black-box models) or preference learning (which tries to extract human values), agent foundations is about first principles: designing an agent that’s “aligned by construc...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Configuring machine learning experiments with Fiddle| The Dan MacKinlay stable of variably-well-consider’d enterprises