Figure 1 I want a theory that predicts which features deep nets learn, when they learn them, and why. But neural nets are messy and hard to analyse, so we need to find some way of simplifying them for analysis which still recovers the properties we care about. Deep linear networks (DLNs) are one attempt at that: the models that keep depth, nonconvexity, and hierarchical representation formation while remaining analytically tractable. In principle, they let me connect data geometry (singular ...| The Dan MacKinlay stable of variably-well-consider’d enterprises
1 Origin story Figure 1 Quantization, in a general sense, is the process of mapping a continuous or large set of values to a smaller, discrete set. This concept has roots in signal processing and information theory —search for Vector Quantization (VQ) emerging in the late 1970s and early 1980s. Think things like the Linde-Buzo-Gray (LBG) algorithm (Linde, Buzo, and Gray 1980). VQ represents vectors from a continuous space using a finite set of prototype vectors from a “codebook,” often...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 There is lots of fractal-like behaviour in NNs. Not all the senses in which fractal-like-behaviour is used are the same; Figure 2 finds fractals in a transformer residual stream for example, but there are fractal loss landscapes, fractal optimiser paths… I bet some of these things connect pretty well. Let‘s find out. 1 Fractal loss landscapes More loss landscape management here [Andreeva et al. (2024); Hennick and Baerdemacker (2025); ]. Estimation theory for fractal qualities ...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Albergo, Boffi, and Vanden-Eijnden. 2023. “Stochastic Interpolants: A Unifying Framework for Flows and Diffusions.”| The Dan MacKinlay stable of variably-well-consider’d enterprises
Neural denoising diffusion models of language| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 I’m going to ICLR in Singapore this year to present some papers (MacKinlay 2025; MacKinlay et al. 2025). 1 Workshops Machine Learning for Remote Sensing Deep Generative Models in Machine Learning: Theory, Principle and Efficacy Frontiers in Probabilistic Inference: Sampling Meets Learning Open Science for Foundation Models (SCI-FM) Advances in Approximate Bayesian Inference 2 References MacKinlay. 2025. “The Ensemble Kalman Update Is an Empirical Matheron Update.” MacKinlay, T...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Normalising flows for PDE learning. Figure 1 Lipman et al. (2023) seems to be the origin point, extended by Kerrigan, Migliorini, and Smyth (2024) to function-valued PDEs. Figure 2: An illustration of our FFM method. The vector field (in black) transforms a noise sample drawn from a Gaussian process with a Matérn kernel (at ) to the function (at ) via solving a function space ODE. By sampling many such , we define a conditional path of measures approximately interpolating between and the f...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Diffusion models for PDE learning. Figure 1 Slightly confusing terminology, because we are using diffusion models to learn PDEs, but the PDEs themselves are often used to model diffusion processes. Also sometimes the diffusion models that do the modelling aren’t actually diffusive, but are based on Poisson flow generative models. Naming things is hell. 1 Classical diffusion models TBD 2 Poisson Flow generative models These are based on non-diffusive physics but also seem to be used to simu...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 Placeholder while I think about the practicalities and theory of AI agents. Practically, this usually means many agents. See also Multi agent systems. 1 Factored cognition Field of study? Or one company’s marketing term? Factored Cognition | Ought: In this project, we explore whether we can solve difficult problems by composing small and mostly context-free contributions from individual agents who don’t know the big picture. Factored Cognition Primer 2 Incoming Introducing smola...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 We can build automata from neural nets. And they seem to do weird things, like learn languages, in a predictable way, which is wildly at odds with our traditional understanding of the difficulty of the task (Paging Doctor Chomsky). How can we analyse NNs in terms of computational complexity? What are the useful results in this domain? Related: grammatical inference, memory machines, overparameterization, NN compression, learning automata, NN at scale, explainability… 1 Computation...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Notes on AI Alignment Fast-Track - Losing control to AI 1 Session 1 What is AI alignment? – BlueDot Impact More Is Different for AI Paul Christiano, What failure looks like 👈 my favourite. Cannot believe I hadn’t read this. AI Could Defeat All Of Us Combined Why AI alignment could be hard with modern deep learning Terminology I should have already known but didn’t: Convergent Instrumental Goals. Self-Preservation Goal Preservation Resource Acquisition Self-Improvement Ajeya Cotra’s...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 Placeholder. Levers for Biological Progress - by Niko McCarty In order for 50-100 years of biological progress to be condensed into 5-10 years of work, we’ll need to get much better at running experiments quickly and also collecting higher-quality datasets. This essay focuses on how we might do both, specifically for the cell. Though my focus in this essay is narrow — I don’t discuss bottlenecks in clinical trials, human disease, or animal testing — I hope others will take o...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 Placeholder, for notes on what kind of world models reside in neural nets. 1 Incoming NeurIPS 2023 Tutorial: Language Models meet World Models 2 References Basu, Grayson, Morrison, et al. 2024. “Understanding Information Storage and Transfer in Multi-Modal Large Language Models.” Chirimuuta. 2025. “The Prehistory of the Idea That Thinking Is Modelling.”Human Arenas. Ge, Huang, Zhou, et al. 2024. “WorldGPT: Empowering LLM as Multimodal World Model.” In Proceedings of the ...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 Certifying NNs to be what they say they are. Various interesting challenges in this domain. I am not sure if this is well-specified category in itself. Possibly at some point I will separate the cryptographic verification from other certification ideas. Or maybe some other taxonomy? TBD 1 Ownership of models Keyword: Proof-of-learning, … (Garg et al. 2023; Goldwasser et al. 2022; Jia et al. 2021) TBD 2 Proof of training E.g. Abbaszadeh et al. (2024): A zero-knowledge proof of trai...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Transformer architectures cannot simulate computer programs. They are not Turing-complete, despite what several papers have claimed. <p>The post Are Transformers Turing-complete? A Good Disguise Is All You Need. first appeared on Life Is Computation.</p>| Life Is Computation
Previously, I discussed training a neural net to clean up images. I’m pleased to say that, using more sophisticated techniques, I’ve since achieved much better results. My latest approa…| Christopher Olah's Blog
For the last few weeks, I’ve been taking part in a small weekly neural net study group run by Michael Nielsen. It’s been really awesome! Neural nets are very very cool! They’re so…| Christopher Olah's Blog