How fast can we compute the dot product?
Eric Bainville - Dec 2007C code
A straightforward C implementation of the dot product is the following:
Real dot(int n,const Real * x,const Real * y) { Real s = 0; for (int i=0;i<n;i++) s += x[i] * y[i]; return s; }
Here Real is either float or double. This code runs at 3.00 cycles per component both for float and double, because gcc generates the same non-SIMD SSE code. We can improve the running time by unrolling the loop, and separating the even and odd components in the sum:
Real dot(int n,const Real * x,const Real * y) { int i; Real s0,s1; s0 = s1 = 0; for (i=n>>1;i>0;i--) { s0 += x[0] * y[0]; s1 += x[1] * y[1]; x += 2; y += 2; } return s0 + s1; }
This code runs at 2.00 cycles per component for float and double. Unrolling further the loop to 4 components per iteration does not change the running time.
SSE dot product : Introduction | Top of Page | SSE dot product : SSE code |