A few recent Arxiv papers and some recent conversations during my lectures made me realize that some optimization people might not be fully aware of important details on SGD when used on functions where the minimizer can be arbitrarily far from the initialization or even in the case when the minimizer does not exist. So, […]