The goal of this tutorial is to elicit the concepts and techniques involving memory copy when programming on NVIDIA® GPUs using CUTLASS and its core backend library CuTe. Specifically, we will stud…| Colfax Research
In this blogpost I want to show how to implement highly efficent matrix transpose operation for Hopper GPUs. I will use native CUDA APIs without abstract...| simons blog