I want to use a SASS instruction which (AFAICT) is not available via a PTX instruction as of CUDA 12.4. Namely, suppose it is: HMMA.16816.F16 - a warp-wide matrix-multiply-and-add, of half-precisio...| Stack Overflow
1.1. Scalable Data-Parallel Computing using GPUs| docs.nvidia.com