In the last few years, we have seen a surge of empirical and theoretical works about “scaling laws”, whose goals are to characterize the performance of learning methods based on various problem parameters (e.g., number of observations and parameters, or amount of compute). From a theoretical point of view, this marks a renewed interest in asymptotic equivalents—something the machine learning community had mostly moved away from (and, let’s be honest, kind of looked down on) in favor o...