Login
From:
Evan Miller’s News
(Uncensored)
subscribe
Attention Is Off By One
https://www.evanmiller.org/attention-is-off-by-one.html
links
backlinks
Roast topics
Find topics
Find it!
Transformer has a mathematical bug that has been overlooked for 6+ years. I propose fixing its outliers with two new devices, Softmax One and QuietAttention: Attention Is Off By One