Login
From:
kexue.fm
(Uncensored)
subscribe
浅谈Transformer的初始化、参数化与标准化 - 科学空间|Scientific Spaces
https://kexue.fm/archives/8620
links
backlinks
Roast topics
Find topics
Find it!
前几天在训练一个新的Transformer模型的时候,发现怎么训都不收敛了。经过一番debug,发现是在做Self Attention的时候$\boldsymbol{Q}\boldsymbol{...