Topic: Understanding the training dynamics of transformers