You’ve probably seen a CUDA tutorial like this one — a classic “Hello World” blending CPU and GPU code in a single “heterogeneous” CUDA C++ source file, with the kernel launched using NVCC’s now-iconic triple-bracket <<<>>> syntax: 1 2 3 4 5 6 7 8 9 10 11 #include #include __global__ void kernel() { printf("Hello World from block %d, thread %d\n", blockIdx.x, threadIdx.x); } int main() { kernel<<<1, 1>>>(); // Returns `void`?! 🤬 return cudaDeviceSynchronize() == cudaSuccess...