In research it's common to compare your solution to some simple approach as a baseline, to prove that whatever complex algorithm you've developed is actually worth using. Sometimes, coming up with this baseline is hard. But now we have ChatGPT, so let's plug a hole in my 2015 research paper on floating-point error improvement by comparing my tool, Herbie, with ChatGPT. Methodology I've chosen a representative sample of 12 benchmarks from the Herbie benchmark suite. Iv'e biased it a bit toward...