CUDA C++ Programming GuideThis guide provides a detailed discussion of the CUDA programming model and programming interface. It then describes the hardware implementation, and provides guidance on how to achieve maximum performance. The appendices include a list of all CUDA-enabled devices, detailed description of all extensions to the C++ language, listings of supported mathematical functions, C++ features supported in host and device code, details on texture fetching, technical specificatio...| docs.nvidia.com
In this post, I’ll iteratively optimize an implementation of matrix multiplication written in CUDA.My goal is not to build a cuBLAS replacement, but to deepl...| siboehm.com
Avoiding CUDA Shared Memory Bank Conflicts| Lei Mao's Log Book