This post is about Nesterov’s universal algorithm and how delicate is to claim that an algorithm is “universal”, “parameter-free”, “adaptive”, or any other similar word to denote the fact that the algorithm does not need prior knowledge of the characteristics of a function to converge at its best rate. This post was born from a […]