I’m not talking about skill, knowledge, or convincing a world focused on radical acceleration that optimization is necessary. Performance optimization is hard because it’s fundamentally a brute-force task, and there’s nothing you can do about it. This post is a bit of a rant on my frustrations with code optimization. I’ll also try to give actionable advice, which I hope enchants your experience.| purplesyringa's blog
For High-Performance Computing engineers, here’s the gist: On Intel CPUs, the vaddps instruction (vectorized float addition) executes on ports 0 and 5. The vfmadd132ps instruction (vectorized fused float multiply-add, or FMA) also executes on ports 0 and 5. On AMD CPUs, however, the vaddps instruction takes ports 2 and 3, and the vfmadd132ps instruction takes ports 0 and 1. Since FMA is equivalent to simple addition when one of the arguments is 1, we can drastically increase the throughput ...| Ash's Blog
Takes a third-party crackme and teaches assembly while reverse engineering the target application. Covers data structure analysis, flow validation, and more| Reverse Engineering
Some mostly too-low-level-to-care-about hardware details of the mask registers introduced in AVX-512.| Performance Matters
A modern CPU is an incredible machine. It can execute many instructions at the same time, it can| specbranch.com
Twitter| Gamozo Labs Blog
Occasionally, I like to peruse uops.info. It is a great resource for micro-optimization:| specbranch.com