In the last article (Approximating slerp) we discussed a need for a fast and reasonably precise quaternion interpolation method. By looking at the data we arrived at two improvements to nlerp, a less precise one and a more precise one. Let’s look at their implementations and performance!| zeux.io
Execution time in many programs is dominated by memory access time, not compute time. This is becoming increasingly true with higher instruction-level parallelism, wider SIMD, larger core counts and a lack of breakthroughs in memory access latency. A lot of performance talks now start by explaining that in a cache hierarchy the last-level cache miss is 100x or more expensive than a first-level cache hit, TLB misses are scary and contiguous data is great. But there is another beast that lurks ...| zeux.io