Login
From:
Franz Louis Cesista
(Uncensored)
subscribe
Adam with Aggressive Gradient Clipping ≈ Smoothed SignSGD/NormSGD
https://leloykun.github.io/ponder/adam-aggressive-clipping/
links
backlinks
Roast topics
Find topics
Find it!
Why does Adam with aggressive gradient value/norm clipping have sparse updates and do well with higher learning rates? Here we show that it is essentially equivalent to a smoothed version of SignSGD/NormSGD.