Login
From:
PyTorch
(Uncensored)
subscribe
Accelerating MoE’s with a Triton Persistent Cache-Aware Grouped GEMM Kernel
https://pytorch.org/blog/accelerating-moes-with-a-triton-persistent-cache-aware-grouped-gemm-kernel/
links
backlinks
Tagged with:
blog
Roast topics
Find topics
Find it!
In this post, we present an optimized Triton BF16 Grouped GEMM kernel for running training and inference on Mixture-of-Experts (MoE) models, such as DeepSeekv3. A Grouped GEMM applies independent GEMMs...