It seemed a bit unfair to devote a blog to machine learning (ML) without talking about its current core algorithm: stochastic gradient descent (SGD). Indeed, SGD has become, year after year, the basic foundation of many algorithms used for large-scale ML problems. However, the history of stochastic approximation is much older than that of ML: its first study by Robbins and Monro [1] dates back to 1951. Their aim was to find the zeros of a function that can only be accessed through noisy meas...