Novel optimizers for maximally updating both the weights and activations of neural networks while keeping weight norms under control. To get there, we needed to invent an efficient, GPU/TPU-friendly method for eigenvalue clipping and solve the Steepest Descent problem on the Positive Semidefinite Cone, Convex Spectrahedron, and finally on the Spectral Ball.| leloykun.github.io
上回说到,当我们把优化对象从向量参数转移到矩阵参数,并选用更适合矩阵的谱范数约束后,Muon优化器便自然而然地出现了。进一步地,我们考虑了给参数加上正交约束后的最速下降方向,这其中又分方阵和非方...| kexue.fm