Figure 1 In classical statistics there are families of model complexity estimates, which are loosely collectively referred to as “Degrees of freedom” of a model. Neither computationally not practically do they scale up to overparmaterized NNs, and there are other tools. Exception: Shoham, Mor-Yosef, and Avron (2025) argues for a connection to the Takeuchi Information Criterion. These end up being popular in developmental interpretability. 1 Learning coefficient The major output of singul...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Overview Maximum likelihood estimation (MLE) is a gold standard estimation procedure in non-Bayesian statistics, and the likelihood function is central to Bayesian statistics (even though it is not maximized in the Bayesian paradigm). MLE may be unpenalized (the standard approach) or various penalty functions such as L1 (lasso, absolute value penalty), and L2 (ridge regression; quadratic) penalties may be added to the log-likelihood to achieve shrinkage (aka regularization). I have been doing...| Statistical Thinking
This is an addendum to my post about typicality, where I try to quantify flawed intuitions about high-dimensional distributions.| Sander Dieleman
A summary of my current thoughts on typicality, and its relevance to likelihood-based generative models.| Sander Dieleman