In the previous blogpost, we established that machine learning algorithms are often hard to tune, and hopefully explained the mechanism for why gradient descent has difficulty with linear losses. In this blogpost, we will lay out some possible solutions.