In my last post, I pointed out that FastTwoSum plus a sort is actually faster than normal TwoSum on modern hardware, and suggested that TwoSum is therefore obsolete. This post expands the analysis and tests futher optimizations on the Apple M1 chip. General approach The general FastTwoSum algorithm is the following: s = a + b bb = s - a e = b - bb This assumes that |a| > |b|, or perhaps a slightly weaker condition. To satisfy this precondition, we need to sort a and b by absolute value, kind ...