I often find its useful to have four threads (or lanes, depending on your preferred parlance) collaborate on 2×2 pixels. Traditionally this would be done with group shared memory. However with Shader Model 6, we have the QuadReadAcrossX/Y/Diagonal intrinsics. The advantage of these intrinsics instead of that group shared memory is that group shared memory […]