Topic: Introducing DeMo: Decoupled Momentum Optimization for efficient distributed LLM training