Login
From:
simons blog
(Uncensored)
subscribe
Let the compiler do the work in CuTeDSL
https://veitner.bearblog.dev/let-the-compiler-do-the-work-in-cutedsl/
links
backlinks
Roast topics
Find topics
Find it!
To archive peak performance on H100 on the task of matrix transpose we need to prefetch matrix tiles when we employ a non persistent way of writing our kernels.