Let’s consider a simple programming challenge: Summing all elements in a large array. Now it stands to reason that this can be easily optimized by using parallelism. Especially for huge arrays with thousands or millions of elements. It also stands to reason that the processing time with parallelism should take as much as regular time divided by the number of CPU cores. As it turns out, this feat is not that easy to achieve.