A few years back, I found a simple trick in tandem with SIMD intrinsics to truly unleash the power of contemporary CPUs. I put the strstr of LibC and the std::search of the C++ Standard Templates Library to the test and hit a throughput of around 1.5 GB/s for substring search on a solitary core. Not too shabby, right? But imagine, that the memory bandwidth could theoretically reach a striking 10-15 GB/s per core.