I think that the problem stated in one of my earlier posts is one of the most fascinating puzzles I came across recently. Many people that got confronted with it said bluntly that the problem simply has no solution, otherwise it would contradict common sense, information theory, etc. But surprisingly, it **does **have a solution.

For X% strictly bigger than 50% the solution is known as the MJRTY algorithm, discovered a while back by R.S. Boyer and J.S. Moore (also called the Majority Vote algorithm). Although the algorithm is deceptively simple, its proof is not. the In the PDF cited above, the authors present a complete proof and also some interesting history around this non-trivial result.

The correct algorithm was also found by Lailin Chen - also presented in the post comments, but without a proof:

Well, Since we already knew there is one and only one who has more votes than any others, we can find him by letting the votes "fight" each other:

Picked the first one, say X1, count 1 for X1, then move on, if X1 repeats, add count to X1.

When X1 doesn't repeat, reduce the count by one, if it reaches 0, picked up the current one (say X2), and do the some counting to X2.

Keep doing this until the end. The last one left with a counting bigger than 0 is the leader:

X3 X1 X1 X2 X1 X3 X3 X3 X4

^

X3, 1

X3 X1 X1 X2 X1 X3 X3 X3 X4

^

X3, 0

X3 X1 X1 X2 X1 X3 X3 X3 X4

^

X1, 1

X3 X1 X1 X2 X1 X3 X3 X3 X4

^

X1, 0

X3 X1 X1 X2 X1 X3 X3 X3 X4

^

X1, 1

X3 X1 X1 X2 X1 X3 X3 X3 X4

^

X1, 0

X3 X1 X1 X2 X1 X3 X3 X3 X4

^

X3, 1

X3 X1 X1 X2 X1 X3 X3 X3 X4

^

X3, 2

X3 X1 X1 X2 X1 X3 X3 X3 X4

^

X3, 1

So X3 is the winner.

Here is a more formal description of the algorithm:

- You have with two registers: an “candidate register” containing the name of some candidate (candidate), and a counter (counter).

- Candidate_register = NULL;

- Counter = 0

- For each new candidate:

If (Candidate_register is NULL)

Candidate_register = candidate

Counter = 1

Else If (candidate == candidate_register)

Counter++;

Else if (counter > 0)

Counter --;

If (counter == 0)

Candidate_register == NULL

- At the end you have the correct candidate in the register.

For X < 50% we can adapt a more complex variation of the same algorithm which requires two passes instead of a single one. Instead of keeping a single register (keeping the last candidate) and its associated counter, we keep a number of K registers (with their associated counters) and run the same algorithm. The integer K is chosen such that 100/(K+1) < X. After this pass, we discovered K candidates but of course the candidate with the biggest counter might **not **be the real candidate. To find the top candidate we simply reset all counters to zero (but keep the candidates found) and re-run the same algorithm.

- Instead of maintaining one “element” register and one counter as in the main algorithm, you need to use K registers, each with K counters.

- First pass

Initialize all candidate registers with NULL and counters with zero

For each candidate in the log file

If the candidate is in one of the K registers, increase its counter

Else if there is one candidate register which is NULL, put it there and increase its counter with 1

Else, reduce all counters by 1

If some counters become zero, set their corresponding candidate registers to NULL

At the end, each of the candidate registers contains every element with frequency > 100/(K+1). One of them is the winner

- Second pass

Initialize all counters with zero, but keep the candidate registers filled with the values above.

For each candidate in the log file

If the candidate is in one of the K registers, increase its counter

At the end, you have all the correct counters for all the candidate registers.

Select the one that has the maximum count.

A variation of this second algorithm is briefly mentioned in this presentation although I could not find a formal article describing it.

What all these algorithms have in common? The fact that you know **ahead of time **that there is a winner, but you don't know which one. While the algorithms are simple to implement, this is a pretty strong condition that is not likely to be encountered in our real world. Maybe on a planet called Xen 🙂

The (generalization of the) problem is called heavy hitters. A google search on it reveals more nice info 😉 It has quite interesting applications in networking.

Also you can find some discution about this problem here: http://infoarena.ro/blog/problema-majoritatii

The discution I linked to is in Romanian, sorry.