Coastal Vectors

Project Euler: 10

Problem 10 of Project Euler is to find the sum of all primes under 2,000,000.

This isn’t a hard problem in most languages. I have recently been going through and re-solving problems in various languages for the fun of it. Just for fun, I decided to solve some problems using Bash scripting. Bash has all the constructs of a real programming language, however, the main draw back for problems like this is that it’s slow. VERY slow. Well, lets see what we can do to speed this problem up.

Brute Force

The brute force approach is pretty simple. Loop through every number, n, less than 2,000,000, and try dividing it by every number from 2 to n-1. If it is divisible, break out of the loop and try the next number.

sum=2
for (( i=3; i<2000000; i++ )); do
    prime=1
    for (( d=2; d<$i; d++ )); do
    if [ $(($i % $d)) -eq 0 ]; then
        prime=0
        break;
    fi
    done
    if [ $prime -eq 1 ]; then
    sum=$(($sum + $i))
    fi
done
echo $sum

This solution has no optimization, but will still yield a result in a relatively small time-frame for most languages. This would take incredibly long to process primes smaller than 2,000,000. For a benchmark, I have run this through primes less than 2,000, and it took 6.430 seconds. We need to optimize.

Optimize

There are a lot of simple changes that we can do to get our compute times down. For instance, if we only loop through odd numbers, we have half as many to process.

Another classic technique to optimize a prime test is to check for factors less than or equal to the square root of the number. The reason this works is that if a factor of our number, n, is larger than the square root of n, we will find the other factor has to be smaller than the square root of n, and will thus be detected.

sum=2
for (( i=3; i<2000; i+=2 )); do
    prime=1
    for (( d=2; d*d<=$i; d++ )); do
    if [ $(($i % $d)) -eq 0 ]; then
        prime=0
        break;
    fi
    done
    if [ $prime -eq 1 ]; then
    sum=$(($sum + $i))
    fi
done
echo $sum

These two changes alone bring our compute time down to .319 seconds for primes under 2,000! I still wouldn’t recommend trying computing all primes under 2,000,000. I tried, and killed it after 3 days without a result!

Sieve of Eratosthenes

Prime sieves are a super fast way of computing prime numbers without having to do any division. This method is attributed to Eratosthenes of Cyrene, a Greek mathematician living about 200 BC. (You can read more about the history on Wikipedia).

Basically, you have an array of every number you want to check. Start by making each element of the array equal to 1, indicating it is a prime. Go through the array, starting at two. Whenever you hit a value that is 1, mark all the multiples of that value as 0’s. So, we look at element 2, which is prime because it’s value is 1. element whose index is a multiple of 2 should then get marked as 0, since that cannot be prime. You then advance to the next element, and repeat until you’ve visited every number in the array.

limit=2000
primes[$limit]=0
primeList[0]=2
primeIndex=0
sum=0
for (( i=2; i<$limit; i++ )); do
    if [[ ${primes[$i]} -eq \0 ]]; then
        sum=$(($sum+$i))
        primeList[$primeIndex]=$i
        primeIndex=$(($primeIndex+1))
        for (( n=$i; n<$limit; n+=$i )); do
            primes[$n]=1
        done
    fi
done

echo $sum

Implementing this method definitely improves results for our benchmark of primes less than 2,000; .186 seconds! However, this approach does not scale up to the desired 2,000,000 because Bash arrays are super slow as they get larger. Even though this approach worked great in our benchmark, it will still take upwards of 3 days to complete for all the primes under 2,000,000.

Combination of Methods

By using a sieve method combined with our optimized method, I was able to finally get the correct answer in 34 minutes using a Bash script.

I used a sieve to generate the primes between 1 and sqrt(2000000). This was fast since the array wasn’t nearly as big. I then used the array of primes that I generated to check every odd number between 2 and 2,000,000.

Another shortcut to reduce computation is to make a second array containing the squares of each prime. This means that we don’t need to perform multiplication in each loop when we are checking for primality.

limit=2000000
sum=0

# Generate primes up to sqrt of 2000000 using sieve
primeLimit=$(echo "scale=0;sqrt($limit*2)" | bc)
primes[$primeLimit]=0
primeList[0]=2
primeListSquared[0]=4
primeIndex=0
for (( i=2; i<$primeLimit; i++ )); do
    if [[ ${primes[$i]} -eq \0 ]]; then
        sum=$(($sum+$i))
        primeList[$primeIndex]=$i
        primeListSquared[$primeIndex]=$i*$i
        primeIndex=$(($primeIndex+1))
        for (( n=$i; n<$primeLimit; n+=$i )); do
            primes[$n]=1
        done
    fi
done

# Make sure primeLimit is an odd number
[[ $(($primeLimit % 2)) -eq 0 ]] && primeLimit=$(($primeLimit + 1))

# Use known primes from above to check each remaining number for primality
for (( i=$primeLimit; i<$limit; i+=2 )); do
    prime=1
    for (( p=0; ${primeListSquared[$p]}<=$i; p++ )); do
        if [[ $(($i % ${primeList[$p]})) -eq 0 ]]; then
            prime=0
            break
        fi
    done
    # Record!
    if [[ $prime -eq 1 ]]; then
        sum=$(($sum+$i))
    fi
done

echo $sum

34 minutes. Finally, a “reasonable” result (by some definitions of reasonable).

Multi-Processing

The last step I took was by splitting up the task of finding primes into multiple processes. This allowed all 8 of my cores to work on the solution. I moved my prime search method into a function that took a start and stopping point as parameters, and then echoed the sum for that portion back to a while loop that added them all together.

limit=2000000

# Generate primes up to sqrt of limit
primeLimit=$(echo "scale=0;sqrt($limit*2)" | bc)
primes[$primeLimit]=0
primeList[0]=2
primeListSquared[0]=4
primeIndex=0
for (( i=2; i<$primeLimit; i++ )); do
    if [[ ${primes[$i]} -eq \0 ]]; then
        primeList[$primeIndex]=$i
        primeListSquared[$primeIndex]=$i*$i
        primeIndex=$(($primeIndex+1))
        for (( n=$i; n<$primeLimit; n+=$i )); do
            primes[$n]=1
        done
    fi
done

function checkRange() {
    start=$1
    end=$2
    rangeSum=0

    # Make sure range starts at an odd number
    [[ $start -lt 2 ]] && start=3 && rangeSum=2
    [[ $(($start % 2)) -eq 0 ]] && start=$(($start + 1))

    # Use known primes to check all remaining numbers for primality
    for (( i=$start; i<$end; i+=2 )); do
        prime=1
        for (( p=0; ${primeListSquared[$p]}<=$i; p++ )); do
            if [[ $(($i % ${primeList[$p]})) -eq 0 ]]; then
                prime=0
                break
            fi
        done
        # Record!
        if [[ $prime -eq 1 ]]; then
            rangeSum=$(($rangeSum+$i))
        fi
    done
    echo $rangeSum
}

# Split up task into chunks, and check with multiple processes
sum=0
range=$(($limit / 50))
for (( start=0; start<$limit; start+=$range )); do
    checkRange $start $(($start+$range)) &
done | while read result; do
    sum=$(($sum+$result))
    echo $sum
done | tail -1

This method brought the compute time down to 3 minutes 22.075 seconds.

Conclusion

Project Euler gives the guideline that if your solution takes more than a minute, you’re probably doing it wrong. For this particular task though, I think 3 minutes is rather successful given the constraints of Bash. It was a fun project to try to optimize such a relatively simple algorithm within the constraints of this little scripting language.

If you can find any way to speed up the results, please let me know!

Visit the solution on my GitHub page.

This site uses Akismet to reduce spam. Learn how your comment data is processed.