Transformer has a mathematical bug that has been overlooked for 6+ years. I propose fixing its outliers with two new devices, Softmax One and QuietAttention: Attention Is Off By One| Evan Miller’s News