Hi, I’ve just finished coding up and debugging a 1D and 3D shock physics benchmark for Zig: The larger 3D Zig benchmark runs about 15% faster than the C reference implementation compiled with gcc -O3 on my machine ( which is faster than gcc -O2). I used the aarch64 tarball off the download page to do my run. I notice that there are a lot of unexploited “low hanging fruit” peephole optimization opportunities in the FP assembly language, e.g. fmadd/fmsub/fnmsub. ZIg can only get much fast...