Optimizing a Simple Math Problem
Problem: Sum each number in the range of 0 to 1000 that is divisible by either 3 or 5 (or both), divisible meaning that the remainder is 0. This is the first problem posted on Project Euler[1]. The simplest solution to this problem is iterating from 0 to 1000 while checking each number if the remainder of the division is zero. This solution, written in C:
unsigned long sum3or5_simple(unsigned long range) { unsigned long sum = 0; for (unsigned long i = 0; i < range; i++) { if((i % 3) == 0 || (i % 5) == 0) { sum += i; } } return sum; }
This algorithm is nice and well, but it’s O(n) and uses two division instructions (the modulo operator % is implemented by the compiler as division), which are relatively slow.
If you look a bit closer at the algorithm, you will notice that a certain repetition is done. Every 15 iterations result in the same set of instructions, because 3 * 5 = 15.
So what exactly happens in these 15 iterations?
- Numbers i+3, i+6, i+9 and i+12 are divisble by 3.
- Numbers i+5 and i+10 are divisible by 5.
- Number i+0 is divisible by 3 and 5 (note that i+15 will be handled by the next iteration).
This results in:
sum += i+0 + i+3 + i+5 + i+6 + i+9 + i+10 + i+12;
We can now rewrite our original algorithm, in a much faster way, still O(n) but no division instructions and (15 times!) less iterations.
unsigned long sum3or5_improved(unsigned long range) { unsigned long sum = 0, left = range % 15; range -= left; for (unsigned long i = 0; i < range; i += 15) { sum += i+0 + i+3 + i+5 + i+6 + i+9 + i+10 + i+12; } // Process left-overs for (; left != 0; range++, left--) { if((range % 3) == 0 || (range % 5) == 0) { sum += range; } } return sum; }
We still need a small loop for the end of the range in case `range’ is not a multiple of 15, which is the second for-loop in the code above. Also, note that the first loop can be rewritten as:
// Original: sum += i+0 + i+3 + i+5 + i+6 + i+9 + i+10 + i+12; // Reordering `i' and the constants for (unsigned long i = 0; i < range; i += 15) { sum += i + i + i + i + i + i + i + 0 + 3 + 5 + 6 + 9 + 10 + 12; } // Which is equal to for (unsigned long i = 0; i < range; i += 15) { sum += (i * 7) + (0 + 3 + 5 + 6 + 9 + 10 + 12); } // Which is equal to for (unsigned long i = 0; i < range; i += 15) { sum += i * 7 + 45; }
So what just happened? We rewrote an algorithm that loops O(n) times and does approximately O(n*2) divisions to an algorithm that does O(n/15) iterations with one multiplication and two additions per iteration.
Actually, I was surprised when I found out that my compiler optimized it even further to the following: (When using larger values for range, keep in mind that you will likely be interested in using 64bit integers).
for (unsigned long i = 0; i < range * 7; i += 105) { sum += i + 45; } // By doing a little trick we can optimize the addition out, // but you can forget about this. for (unsigned long i = 45; i < range * 7 + 45; i += 105) { sum += i; }
It eliminates the multiplication by multiplying the maximum value and the iteration constant by 7, now resp. range * 7 and 15 * 7 = 105.
However, this algorithm demands more optimalization.. It’s a loop, that just adds some values to another value. What can be done?
To solve this, we will use this nice little equation[2] (which I will not prove for correctness here): (∑(k) from 0 to n) = ½n(n+1).
So basically.. The sum of a range of values, say 0 to 10, equals ½*n(n+1). This equation is proven to be correct, but here is a simple verification: 1+2+3+4+5+6+7+8+9+10 = 55, 10*(10+1)/2 = 55.
The real question is, how will we apply this equation to our algorithm? Let’s calculate a few iterations manually to see exactly what our improved algorithm does.
i = 0: sum = 0 + 0 * 7 + 45 = 45; i = 15: sum = 45 + 15 * 7 + 45 = 195; i = 30: sum = 195 + 30 * 7 + 45 = 450; i = 45: sum = 450 + 45 * 7 + 45 = 810; ...
Now comes the interesting part.. We can solve this in O(1), yay! But how? The improved algorithm consists of two parts:
- The easy part, 45 is added every iteration (which is range/15 times) to the sum. Calculating this results in: sum = … + 45 * (range / 15).
-
The relatively harder part, i is multiplied by 7 and added to the sum for every iteration. However, if we divide the iterator by 15 every iteration, then we get the values 0, 1, 2, …
This means that we can reduce the for-loop using the equation to a simple formula: (∑(k) from 0 to n) with n = (range - 1) / 15.
Note the range-1 because otherwise we include one iteration too much. If we substitute our n into the formula, we get something unreadable, so we will omit this.
After reducing our algorithm to the equation and applying it, we have to “get back” to our algorithm. This is pretty easy, we divided the iterator by 15, so we multiply it again by 15 and we need it 7 times, so we multiply it by 7 as well.
Applying the equation: sum = n * (n + 1) / 2;
Get back to our algorithm: sum = sum * 15 * 7;
Combining these two elements results in a bit unreadable, but totally awesome formula and new algorithm:
unsigned long sum3or5_O1(unsigned long range) { unsigned long left = range % 15; range -= left; // Calculate n unsigned long n = (range - 1) / 15; // Apply the equation unsigned long sum = n * (n + 1) / 2; // "Get back" to our algorithm sum = sum * 15 * 7; // Apply the "easy part" sum = sum + 45 * (range / 15); // Calculate the left over part the old-fashioned way for (; left != 0; range++, left--) { if((range % 3) == 0 || (range % 5) == 0) { sum += range; } } return sum; }
A quick ‘n dirty (aka unreadable) variant of the same algorithm:
unsigned long sum3or5_O1(unsigned long range) { unsigned long left = range % 15; range -= left; unsigned long sum = (range - 1) / 15 * ((range - 1) / 15 + 1) / 2 * 15 * 7 + 45 * (range / 15); // Calculate the left over part the old-fashioned way for (; left != 0; range++, left--) { if((range % 3) == 0 || (range % 5) == 0) { sum += range; } } return sum; }
We now have an algorithm that will calculate the sum for the given range in O(1), this means it’s instant (it doesn’t matter if the range is 1000 or 1e30, it will always return after an almost fixed amount of cpu cycles, note that the second for-loop in sum3or5_O1 will iterate a maximum of 14 times).
Whereas it is possible to calculate the sum for the range upto 1000, as well as the range upto 1e8 (that is, in a couple of seconds), using our first algorithm (sum3or5_simple), it is impossible to calculate the sum for the range upto 1e30 using currently available pc’s on the market, as it will take millions of years.
Solutions:
- sum3or5(1000) = 233168 # sum3or5 being either one of the implementations.
- sum3or5(1e8) = 2333333316666668 # 64bit integers are being used here.
- sum3or5_simple(1e30) = impossible to calculate.
- sum3or5_O1(1e30) = 9661618530273215556 # lower 64 bits, value exceeds a 64bit integer with many digits.
References:
- Problem 1 - Project Euler
- Evaluating Sums - Wikipedia
Wonderful information, neat website style, continue the great work
This can be improved, see: http://shobute.com/posts/project-euler-1 ☺