Topic: Forgetting Transformer: Softmax Attention with a Forget Gate