Wherein algebraic geometry is applied to characterise singularities in the loss surfaces of overparameterized neural networks, and the local learning coefficient is introduced as an effective dimension.| The Dan MacKinlay stable of variably-well-consider’d enterprises
Assumed audience: Mid career technical researchers considering moving into AI Safety research, career advisors in the EA/AI Safety space, AI Safety employers and grantmakers Nonetl;dr AI career advice orgs, prominently 80,000 Hours, encourage career moves into AI risk roles, including mid‑career pivots into roles in AI safety research labs. Without side information, that advice is not credible for mid‑career readers, because it does not have a calibration mechanism. Advice organizations i...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 Agent foundations is the branch of AI alignment that tries to answer: if we were to build a superintelligent system from scratch, what clean, mathematical objective could we give it so that it robustly does what we want, even if we cannot understand the system ourselves? Unlike interpretability (which inspects black-box models) or preference learning (which tries to extract human values), agent foundations is about first principles: designing an agent that’s “aligned by construc...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1: This stepper machine is my kind of fitness landscape. Fitness, in evolutionary biology, measures an organism’s expected reproductive success. Utility, in economics and decision theory, measures an agent’s preferences, i.e. it is what we seek out. We often blur the lines between what an organism wants and what it evolutionarily needs. Why do we love sugar? The standard explanation is that in ancestral environments, sweetness signalled calorie density, which aided survival and r...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 I’m working on some proposals in AI safety at the moment, including this one. I submitted this particular one to the UK AISI Alignment Project. It was not funded. Note that this post is different than many on this blog. It’s highly speculative and yet not that measured; that’s because it’s a pitch, not an analysis. It doesn’t contain a credible amount of detail (there were only two text fields with a 500 word limit to explain the whole thing) I present it here for comment ...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Wherein a failed application is set forth, and two research pathways are outlined: a Bias‑Robust Oversight programme at UTS’s Human Technology Institute, and MCMC estimation of the Local Learning Coefficient with Timaeus’ Murfet.| The Dan MacKinlay stable of variably-well-consider’d enterprises
I need to mention Adam and RMSProp etc somewhere| The Dan MacKinlay stable of variably-well-consider’d enterprises
I split this off from clickbait bandits for discoverability, and because it has grown larger than its source notebook. Figure 1 Since the advent of the LLM era, the term human reward hacking has become salient. This is because we fine tune lots of LLMs using reinforcement learning, and RL algorithms are notoriously prone to “cheating” in a manner we interpret as “reward hacking”. Things I have been reading on this theme: Benton et al. (2024), Greenblatt et al. (2024), Laine et al. (2...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Singular Learning Theory’s eldest child in practice\| The Dan MacKinlay stable of variably-well-consider’d enterprises
Oddesty| The Dan MacKinlay stable of variably-well-consider’d enterprises
7.1 Value function methods| The Dan MacKinlay stable of variably-well-consider’d enterprises
Distributed sensing, swarm sensing, adaptive social learning, multi-agent adaptation, iterated game theory with learning etc| The Dan MacKinlay stable of variably-well-consider’d enterprises
Iterated and evolutionary game theory| The Dan MacKinlay stable of variably-well-consider’d enterprises
Getting ready for the grown-ups to arrive| The Dan MacKinlay stable of variably-well-consider’d enterprises
Microeconomic of compute| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 I want a theory that predicts which features deep nets learn, when they learn them, and why. But neural nets are messy and hard to analyse, so we need to find some way of simplifying them for analysis which still recovers the properties we care about. Deep linear networks (DLNs) are one attempt at that: the models that keep depth, nonconvexity, and hierarchical representation formation while remaining analytically tractable. In principle, they let me connect data geometry (singular ...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Reinforcement learning meets iterated game theory meets theory of mind| The Dan MacKinlay stable of variably-well-consider’d enterprises
1 Origin story Figure 1 Quantization, in a general sense, is the process of mapping a continuous or large set of values to a smaller, discrete set. This concept has roots in signal processing and information theory —search for Vector Quantization (VQ) emerging in the late 1970s and early 1980s. Think things like the Linde-Buzo-Gray (LBG) algorithm (Linde, Buzo, and Gray 1980). VQ represents vectors from a continuous space using a finite set of prototype vectors from a “codebook,” often...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 In classical statistics there are families of model complexity estimates, which are loosely collectively referred to as “Degrees of freedom” of a model. Neither computationally not practically do they scale up to overparmaterized NNs, and there are other tools. Exception: Shoham, Mor-Yosef, and Avron (2025) argues for a connection to the Takeuchi Information Criterion. These end up being popular in developmental interpretability. 1 Learning coefficient The major output of singul...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 Let’s reason backwards from the final destination of civilisation, if such a thing there be. What intelligences persist at the omega point? With what is superintelligence aligned in the big picture? Various authors have tried to put modern AI developments in continuity with historical trends from less materially-sophisticated societies, through more legible, compute-oriented societies, to some or set of attractors at the end of history. Computational superorganisms. Singularities....| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 The open-source AI scene has been kicking goals. People have wrestled models, datasets, and all the fixings away from the big-wigs. The final boss of that game is the access to expensive compute. Training a foundation model from scratch takes a warehouse full of GPUs that costs more than a small nation’s GDP. It’s been the one thing keeping AI development firmly in the hands of a few tech giants with cash to burn. Until now, maybe. So, the citizen science equivalent for the NN a...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Fine tuning foundation models| The Dan MacKinlay stable of variably-well-consider’d enterprises
All (1556)| The Dan MacKinlay stable of variably-well-consider’d enterprises
I don’t know much about this variant of Bayes, but the central idea is that we consider Bayes updating as a coherent betting rule and back everything else out from that. This gets us something like classic Bayes but with an even more austere approach to what probability is. I am interested in this because, following an insight of Susan Wei’s, I note that it might be an interesting way of understanding when foundation models do optimal inference, since most neural networks are best underst...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Coarse-graining empowerment| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 I’m not sure what this truly means or that anyone is, but I think it wants to mean something like quantifying architectures that make it “easier” to learn about the phenomena of interest. This is a practical engineering discipline in NNs but maybe also intersting to think about in humans.| The Dan MacKinlay stable of variably-well-consider’d enterprises
The United States of America is a country in North America, bordered by Canada to the north, Mexico to the south, the Atlantic Ocean to the east, and the Pacific Ocean to the west. They share, with Canada, the honour of inventing peanut butter.| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 There is lots of fractal-like behaviour in NNs. Not all the senses in which fractal-like-behaviour is used are the same; Figure 2 finds fractals in a transformer residual stream for example, but there are fractal loss landscapes, fractal optimiser paths… I bet some of these things connect pretty well. Let‘s find out. 1 Fractal loss landscapes More loss landscape management here [Andreeva et al. (2024); Hennick and Baerdemacker (2025); ]. Estimation theory for fractal qualities ...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Albergo, Boffi, and Vanden-Eijnden. 2023. “Stochastic Interpolants: A Unifying Framework for Flows and Diffusions.”| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1: And Nadab and Abihu, the sons of Aaron, each took his censer, put fire in it, added incense, and offered strange fire before the Lord, which He had not commanded them. Then fire went out from the Lord and devoured them, and they died before the Lord. Lev 10:1-2 Notes on committing to things, and the implications of that for cooperation. Relevant to multi-agent causality where agents make decisions, in the context of iterated games in multi-agent systems with applications to AI safe...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Disentangled representation learning| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 An interesting inverse design question: how should I design a system to optimise for truthfulness? Brief summary here. @Frongillo2024Recent: This note provides a survey for the Economics and Computation community of some recent trends in the field of information elicitation. At its core, the field concerns the design of incentives for strategic agents to provide accurate and truthful information. Such incentives are formalized as proper scoring rules, and turn out to be the same obj...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 Learning agents in a multi-agent system which account for and/or exploit the fact that other agents are learning too. This is one way of formalising the idea of theory of mind. Learning with theory of mind works out nicely for reinforcement learning, in e.g. opponent shaping, and may be an important tool for understanding AI agency and AI alignment, as well as aligning more general human systems. Other interesting things might arise from a good theory of other-aware learning, such ...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 An interesting Bayesian functional regression trick based on the so-called -exponential distribution: which has a special relationship with Besov spaces and so connects to functional inverse problems. It seems to be in the same family as elliptical process as per Bånkestad et al. (2020). NB, the -exponential distribution is not the Tsallis q-exponential distribution but rather one developed by Dashti, Harris, and Stuart (n.d.). Li, O’Connor, and Lan (2023): Regularization is one ...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Singapore| The Dan MacKinlay stable of variably-well-consider’d enterprises
Configuring machine learning experiments with Fiddle| The Dan MacKinlay stable of variably-well-consider’d enterprises
Content warning: Links to and discussion of edgy people with perverse opinions on hot-button topics too diverse to mention but which surely include gender, eugenics, speech and religion Notes on eccentrics, mavericks, outsider geniuses and fools. What my family called stroppy people. Lacking an identifiable label so you show up in diversity metrics? Not sure whether you are rebelling against society or conforming to a subgroup? How do you get by as a mad outsider? Will you be right twice a da...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Neural denoising diffusion models of language| The Dan MacKinlay stable of variably-well-consider’d enterprises
1 Key Research Directions| The Dan MacKinlay stable of variably-well-consider’d enterprises
Game theory and decision theory for lots of interacting agents| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 Jonathan Huggins summarizes: Complexity of Inference in Bayesian Networks. To cover: Sampling from a posterior measure versus calculating, it, approximation versus exact computation. Graphical models. What does calculation even mean on arbitrary measure spaces? 1 References Bodlaender, Donselaar, and Kwisthout. 2022. “Parameterized Complexity Results for Bayesian Inference.” Cooper. 1990. “The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks....| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 On rituals without (necessarily) faith. TBD Related: tribal bonding, mind altering substances. 1 Incoming Future Day 2025 – Science, Technology & the Future Wheal’s Homegrown Humans Newsletter Ritual Behavior, Habits, Human Culture, Religion, Civilization, Marriage, Death, Burning Man & Community | Dimitris Xygalatas | #75 (5) Psychedelics, Civilization, Religion, Death & Plant Medicine | Brian Muraresku | #1 Shamanism, Psychedelics, Social Behavior, Religion & Evolution of Huma...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 I’m going to ICLR in Singapore this year to present some papers (MacKinlay 2025; MacKinlay et al. 2025). 1 Workshops Machine Learning for Remote Sensing Deep Generative Models in Machine Learning: Theory, Principle and Efficacy Frontiers in Probabilistic Inference: Sampling Meets Learning Open Science for Foundation Models (SCI-FM) Advances in Approximate Bayesian Inference 2 References MacKinlay. 2025. “The Ensemble Kalman Update Is an Empirical Matheron Update.” MacKinlay, T...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Normalising flows for PDE learning. Figure 1 Lipman et al. (2023) seems to be the origin point, extended by Kerrigan, Migliorini, and Smyth (2024) to function-valued PDEs. Figure 2: An illustration of our FFM method. The vector field (in black) transforms a noise sample drawn from a Gaussian process with a Matérn kernel (at ) to the function (at ) via solving a function space ODE. By sampling many such , we define a conditional path of measures approximately interpolating between and the f...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Diffusion models for PDE learning. Figure 1 Slightly confusing terminology, because we are using diffusion models to learn PDEs, but the PDEs themselves are often used to model diffusion processes. Also sometimes the diffusion models that do the modelling aren’t actually diffusive, but are based on Poisson flow generative models. Naming things is hell. 1 Classical diffusion models TBD 2 Poisson Flow generative models These are based on non-diffusive physics but also seem to be used to simu...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 I just ran into this area while trying to invent something similar myself, only to find I’m years too late. It’s an interesting analysis suited to relaxed or approximated causal modelling of causal interventions. It seems to formalise coarse-graining for causal models. We suspect that the notorious causal inference in LLMs might be built out of such things or understood in terms of them. 1 Causality in hierarchical systems A. Geiger, Ibeling, et al. (2024) seems to summarise SOT...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 Placeholder. I knew little about the Low Countries, but between my abiding interest in Indonesia, the jewel in the former Dutch trading empire, and the fact that I live in Australia, a country discovered for Europe by the Dutch, and my obsession with the Dutch Rijksmuseum, I confess I have intrigued myself. Cheapassbikes History Of The Netherlands podcast — Republic of Amsterdam Radio 1 Inventions of capitalism Many early features of capitalism started in the Netherlands. Financia...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 Placeholder while I think about the practicalities and theory of AI agents. Practically, this usually means many agents. See also Multi agent systems. 1 Factored cognition Field of study? Or one company’s marketing term? Factored Cognition | Ought: In this project, we explore whether we can solve difficult problems by composing small and mostly context-free contributions from individual agents who don’t know the big picture. Factored Cognition Primer 2 Incoming Introducing smola...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 Placeholder. Notes on how to implement alignment in AI systems. This is necessarily a fuzzy concept, because Alignment is fuzzy and AI is fuzzy. We need to make peace with the frustrations of this fuzziness and move on. 1 Fine tuning to do nice stuff Think RLHF, Constitutional AI etc. I’m not greatly persuaded that these are the right way to go, but they are interesting. 2 Classifying models as unaligned I’m familiar only with mechanistic interpretability at the moment; I’m su...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 1 Incoming Inside the U.K.’s Bold Experiment in AI Safety | TIME Governing with AI | Justin Bullock Deep atheism and AI risk - Joe Carlsmith Wong and Bartlett (2022) we hypothesize that once a planetary civilization transitions into a state that can be described as one virtually connected global city, it will face an ‘asymptotic burnout’, an ultimate crisis where the singularity-interval time scale becomes smaller than the.env time scale of innovation. If a civilization develo...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 Notebook on the idea of human domestication. Possibly the opposite of being stroppy. Paul Christiano, What failure looks like Amongst the broader population, many folk already have a vague picture of the overall trajectory of the world and a vague sense that something has gone wrong. There may be significant populist pushes for reform, but in general these won’t be well-directed. Some states may really put on the brakes, but they will rapidly fall behind economically and militaril...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 We can build automata from neural nets. And they seem to do weird things, like learn languages, in a predictable way, which is wildly at odds with our traditional understanding of the difficulty of the task (Paging Doctor Chomsky). How can we analyse NNs in terms of computational complexity? What are the useful results in this domain? Related: grammatical inference, memory machines, overparameterization, NN compression, learning automata, NN at scale, explainability… 1 Computation...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Content warning: Stuff that I would prefer to have no opinion upon if that were an option. Culture wars. Figure 1 An interesting question in movement design is how much to embody the change you wish to see in the world in your pursuit of it. Examples If your movement advocates 8-hour workdays, should you work 8-hour days to build it? If you are relentlessly corporate in your pursuit of communism, will you have trouble recruiting cadre? Was it an acceptable trade-off for communism that the Bo...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Notes on AI Alignment Fast-Track - Losing control to AI 1 Session 1 What is AI alignment? – BlueDot Impact More Is Different for AI Paul Christiano, What failure looks like 👈 my favourite. Cannot believe I hadn’t read this. AI Could Defeat All Of Us Combined Why AI alignment could be hard with modern deep learning Terminology I should have already known but didn’t: Convergent Instrumental Goals. Self-Preservation Goal Preservation Resource Acquisition Self-Improvement Ajeya Cotra’s...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 One of the problems of the modern world is that it is so deep, specialised, and complicated that it is difficult hard to tell real progress in some specialised areas from bullshit. This is a problem in science, but I think, everywhere that anything complicated is happening This leads to the deeper problem that it is easier to seem good than to be good. Sometimes, e.g. art, these can be nearly the same thing. But sometimes, with the production of material goods or the production of ...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 Placeholder. Levers for Biological Progress - by Niko McCarty In order for 50-100 years of biological progress to be condensed into 5-10 years of work, we’ll need to get much better at running experiments quickly and also collecting higher-quality datasets. This essay focuses on how we might do both, specifically for the cell. Though my focus in this essay is narrow — I don’t discuss bottlenecks in clinical trials, human disease, or animal testing — I hope others will take o...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 A mess of links about the state and its capacity to do things, incorporating institutions, social licence and the like. Participatory budgeting, participatory resource management, lack thereof. Practically speaking, it seems that most citizens would kinda like the state to invest in maintaining a society that was not a crumbling pit of poverty awash in disease and toxic waste and ruled over by robber barons. However, given that they don’t trust the state to get anything right, the...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 Placeholder, for notes on what kind of world models reside in neural nets. 1 Incoming NeurIPS 2023 Tutorial: Language Models meet World Models 2 References Basu, Grayson, Morrison, et al. 2024. “Understanding Information Storage and Transfer in Multi-Modal Large Language Models.” Chirimuuta. 2025. “The Prehistory of the Idea That Thinking Is Modelling.”Human Arenas. Ge, Huang, Zhou, et al. 2024. “WorldGPT: Empowering LLM as Multimodal World Model.” In Proceedings of the ...| The Dan MacKinlay stable of variably-well-consider’d enterprises