Some good comments on my BCL prime quiz… but still one unanswered question (at the end)…

Thomas Woelfer quickly got that the place in the BCL for generating prime numbers is in Hashtable.cs and Eric Wilson does a good job explaining why…

*Specifically, primes are used when the size of a hash table must be increased because of the load has reached a certain level. The size of the hashtable is increased such that the number of buckets in the hashtable increases to the smallest prime number that is larger than twice the current number of Hashtable buckets. *

*The reason for this is two fold. Using a prime number of buckets reduces the effect of clustering in the hashtable (having multiple keys hash to the same location). The size is at least doubled because by doing that the BCL can insure that over a series of N insert, the total time for all inserts takes no more than O(log(N)) total time*

And I really love the reference to Juan Felipe Machado gives: “Knuth’s Art of Computer Programming, Vol. 3, p. 528-9”, clearly a desktop reference favorite. How long before it is common place to quote the SLAR v1, pp 239-243 ðŸ˜‰

In addition, Norman Headlam gives some of the code in question, but the method I was really talking about is this one…

00149 private bool Primep (int candidate) {

00150 if ((candidate & 1) != 0) {

00151 for (int divisor = 3; divisor < (int)Math.Sqrt (candidate); divisor+=2){

00152 if ((candidate % divisor) == 0)

00153 return false;

00154 }

00155 return true;

00156 }

00157 else {

00158 return (candidate == 2);

00159 }

00160 }

00161

A few of you pointed out some interesting issues, but not what I was looking for… the problem is in the code above and it results in some numbers being return as prime that are not really prime…The bug can be found for “normal” input ranges such as [1-100].

It should have been

divisor <= (int)Math.Sqrt(candidate) in the for loop condition.

If you pass in 9, the function will return true because divisor starts at 3, which is not less than sqrt(9), so the for loop never runs and the code drops to the return true line.

same if you put in 1 which by definition isn’t prime.

btw, using divisor <= (int)Math.Sqrt(candidate) still returns true for candidate = 1

for (int divisor = 3; divisor < (int)Math.Sqrt (candidate); divisor+=2)

should be:

sqrtcand = Math.Sqrt(candidate);

for (int divisor = 3; divisor < sqrtcand; divisor+=2)

its a minor thing, but you dont want to be calling Math.Sqrt on every loop iteration.

What about:

int sqrtcand = (int)Math.Sqrt(candidate);

for (int divisor = 3; divisor <= sqrtcand; divisor+=2)

Indeed, every squared prime will be considered prime.

Funny how I had a slight doubt yesterday about that ‘<‘ but I didn’t push into it as I was looking for something more subtle…

Cheers,

–Jonathan

<quote>for (int divisor = 3; divisor < (int)Math.Sqrt (candidate); divisor+=2)

should be:

sqrtcand = Math.Sqrt(candidate);

for (int divisor = 3; divisor < sqrtcand; divisor+=2)</quote>

I believe this is neccessary, the compiler knows that candidate is constant and therefore sqrt(candidate) is constant as well and will do this optimisation itself

Actually I was wrong, the nither the C# compilter nor the JIT compiler are able to optimize this themselves, presuably because they don’t know whether Math.Sqrt has any intended Sideeffects besides the result. I wonder if there is a way to indicate this to the compiler..?

<code>

if ((candidate & 1) != 0) {

00000000 push ebp

00000001 mov ebp,esp

00000003 sub esp,20h

00000006 push edi

00000007 push esi

00000008 push ebx

00000009 mov dword ptr [ebp-4],ecx

0000000c mov esi,edx

0000000e xor ebx,ebx

00000010 xor edi,edi

00000012 test esi,1

00000018 je 00000056

for (int divisor = 3; divisor < (int)Math.Sqrt (candidate); divisor+=2){

0000001a mov ebx,3

0000001f nop

00000020 jmp 00000031

if ((candidate % divisor) == 0)

00000022 mov eax,esi

00000024 cdq

00000025 idiv eax,ebx

00000027 test edx,edx

00000029 jne 0000002F

return false;

0000002b xor edi,edi

0000002d jmp 00000063

for (int divisor = 3; divisor < (int)Math.Sqrt (candidate); divisor+=2){

0000002f inc ebx

00000030 inc ebx

00000031 mov dword ptr [ebp-14h],ebx

00000034 mov dword ptr [ebp-20h],esi

00000037 fild dword ptr [ebp-20h]

0000003a fsqrt

0000003c fstp qword ptr [ebp-1Ch]

0000003f push dword ptr [ebp-18h]

00000042 push dword ptr [ebp-1Ch]

00000045 call 725CFA7C

0000004a cmp eax,dword ptr [ebp-14h]

0000004d jg 00000022

return true;

0000004f mov edi,1

00000054 jmp 00000063

return (candidate == 2);

00000056 cmp esi,2

00000059 sete al

0000005c movzx eax,al

0000005f mov edi,eax

00000061 jmp 00000063

}

</code>

versus

<code>

if ((candidate & 1) != 0) {

00000000 push ebp

00000001 mov ebp,esp

00000003 sub esp,20h

00000006 push edi

00000007 push esi

00000008 push ebx

00000009 mov dword ptr [ebp-4],ecx

0000000c mov esi,edx

0000000e mov dword ptr [ebp-0Ch],0

00000015 xor ebx,ebx

00000017 xor edi,edi

00000019 test esi,1

0000001f je 0000005D

int tt = (int)Math.Sqrt (candidate);

00000021 mov dword ptr [ebp-20h],esi

00000024 fild dword ptr [ebp-20h]

00000027 fsqrt

00000029 fstp qword ptr [ebp-1Ch]

0000002c push dword ptr [ebp-18h]

0000002f push dword ptr [ebp-1Ch]

00000032 call 725CF9FC

00000037 mov dword ptr [ebp-0Ch],eax

for (int divisor = 3; divisor < tt;divisor+=2){

0000003a mov ebx,3

0000003f nop

00000040 jmp 00000051

if ((candidate % divisor) == 0)

00000042 mov eax,esi

00000044 cdq

00000045 idiv eax,ebx

00000047 test edx,edx

00000049 jne 0000004F

return false;

0000004b xor edi,edi

0000004d jmp 0000006A

for (int divisor = 3; divisor < tt;divisor+=2){

0000004f inc ebx

00000050 inc ebx

00000051 cmp ebx,dword ptr [ebp-0Ch]

00000054 jl 00000042

return true;

00000056 mov edi,1

0000005b jmp 0000006A

return (candidate == 2);

0000005d cmp esi,2

00000060 sete al

00000063 movzx eax,al

00000066 mov edi,eax

00000068 jmp 0000006A

}

</code>

Use of prime number sizes for the array of hash table buckets is traditional. However the VC++ 7 & 7.1’s hash based containers require a number of buckets that is a power of two:

The integer constant min_buckets specifies the minimum number of buckets to maintain in the hash table. It must be a power of two and greater than zero. The value supplied by hash_compare is 8.

(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vcstdlib/html/vclrfHashcompare.asp?frame=true)

(This is the description of the hash_compare template, which you specialise to override the defaults in hash_set and hash_map container instances.)

It does some clever things to avoid rehashing everything as the number of buckets increases.

I was really going for the <= in line 151 as SN points out… Those off-by-one errors are really a kicker aren’t they?

But good to see the other comments as well…

Is this fixed in Widbey?

Yes, this is fixed in Whidbey

Knuth’s mod prime reference is so common, you would think that is the best way to do a hash table. But in fact, if you can get a better hash to start with, you don’t need a mod prime, as the hash is already well distributed. Then you can use faster and better aligned power of 2 sized tables. That mod prime technique has been slow and obsolete for a long time now.

For an example of a more modern hash algorithm, look at FNV:

http://www.isthe.com/chongo/tech/comp/fnv/

That algorithm has already proven itself, speeding up a number of OS hashtables that used the obsolete mod prime method.

FNV is for hashing strings, and is only indirectly related to hashtables and their hash functions.