When solving Project Euler problems I frequently need to iterate over prime numbers less than a given n. A Sieve of Eratosthenes method quickly and easily finds the small prime numbers; there are more complicated methods that find larger prime numbers, but with a couple of tweaks the Sieve of Eratosthenes can get quite high.
A naive implementation for finding the set of primes below n will:
- Allocate an array of n booleans, initialized to false.
- Allocate an empty list
- For each i in the range 2 to n:
- If the boolean value at this index in the array is true, i is composite. Skip to the next value and check that.
- If the boolean value at this index in the array is false, i is prime!
- Add i to the list of primes
- For each multiple of i in the range 2i to n, set the boolean value at that index in the array to true
There are a handful of simple optimizations that can be made to this naive implementation:
- Step 3d) will have no effect until the multiple of i reaches i2, so the range can be changed to "i2 to n"
- As a direct consequence of this, step 3d) can be skipped entirely once i2 passes n.
- Instead of allocating an array of n booleans, an array of nbits will suffice.
- All the even-indexed bits are set to true on the first pass. Manually recognize that 2 is prime, and only allocate bits for odd-numbered values. Change the outer loop in 3) to "in the range 3 to n", incrementing by two each time. Change the loop 3d) to increment by 2i each time.
- Storing the list of primes takes a lot of memory - more than the sieve. Don't bother creating a list of primes, just write an enumerator that travels the sieve directly.
With these optimizations I can enumerate primes from 2 up to 5 billion (5 * 109) in about seven minutes. Source and binaries attached.
Will enumerate primes <= 5000000000 = 5e+009
Memory for sieve: 298.023 MB
Initialization complete: 983 milliseconds since start
Sieving to 70711
Sieving complete: 4.70292 minutes since start
Picking up the rest to 5000000000
Pickup complete: 6.12252 minutes since start
Enumerating complete: 7.43683 minutes since start
Freeing CPrimes object
There are more complicated sieves like the Sieve of Atkin which perform better but at the cost of being much more complex. So far I haven't had to resort to any of those.
EDIT September 28 2015: moved source to https://github.com/mvaneerde/blog/tree/master/primes