Login
From:
kexue.fm
(Uncensored)
subscribe
训练1000层的Transformer究竟有什么困难? - 科学空间|Scientific Spaces
https://kexue.fm/archives/8978
links
backlinks
众所周知,现在的Transformer越做越大,但这个“大”通常是“宽”而不是“深”,像GPT-3虽然参数有上千亿,但也只是一个96层的Transformer模型,与我们能想象的深度相差甚远。是...
Roast topics
Find topics
Roast it!
Roast topics
Find topics
Find it!
Roast topics
Find topics
Find it!