How fast can we compute the dot product?
Eric Bainville - Dec 2007C code
A straightforward C implementation of the dot product is the following:
Real dot(int n,const Real * x,const Real * y)
{
Real s = 0;
for (int i=0;i<n;i++) s += x[i] * y[i];
return s;
}
Here Real is either float or double. This code runs at 3.00 cycles per component both for float and double, because gcc generates the same non-SIMD SSE code. We can improve the running time by unrolling the loop, and separating the even and odd components in the sum:
Real dot(int n,const Real * x,const Real * y)
{
int i;
Real s0,s1;
s0 = s1 = 0;
for (i=n>>1;i>0;i--)
{
s0 += x[0] * y[0];
s1 += x[1] * y[1];
x += 2;
y += 2;
}
return s0 + s1;
}
This code runs at 2.00 cycles per component for float and double. Unrolling further the loop to 4 components per iteration does not change the running time.
| SSE dot product : Introduction | Top of Page | SSE dot product : SSE code |
