When is random “random enough” ?

An interesting question came up in a private conversation. If you want to pick a random value from a random generated result set and want to make sure that the row is really random, is the known ORDER BY NEWID() really random enough ? The point is that once you have one value and knowing the algorithm you could calculate the next value derived from the first value. So, what could you do to make this even more random ? To state something upfront, making it harder to guess the right value will have to be paid by extra computing cycles. No matter which way you choose, you will have to make sure to find the best way (in terms of computing investment and grade of randomness).

One possible solution is the following:

  • Generate two different NewId()s, order by these and generate a RowNumber from both of them
  • Afterwards join on the RowNumbers (as the highest values are the same) and pick one the value from the Joined result set order by another unique value.
  • If that is not random enough, you could create another RowNumber from another NewId() and join this to the second RowNumber and so on…. (Depending on the grade of paranoia 🙂

A sample could be something like the following:



DROP TABLE dbo.Participant


CREATE TABLE dbo.Participant
    participantId INT,
    FirstName VARCHAR(255),
    LastName VARCHAR(255),
    Speciality VARCHAR(MAX)


INSERT INTO dbo.Participant 
    (1,'Luke', 'Skywalker', 'The force is strong with him') ,
    (2,'Hang', 'Solo', 'Made the Kessel run in less than 12 parsecs (But isn´t that a unit of distance and not time ?!)'),
    (3, 'Chewbacca', 'n/a', 'Doing something like R-R-R-R-R all the time') 

WITH ParticipantCTE(participantId,RowNumber1,RowNumber2,ThedeterministicGuid)
        ROW_NUMBER() OVER (ORDER BY NEWID()) AS RowNumber1, 
        ROW_NUMBER() OVER (ORDER BY NEWID()) AS RowNumber2,
        NEWID() AS ThedeterministicGuid
    FROM dbo.Participant
SELECT TOP(1) P.ParticipantId
FROM dbo.Participant P
INNER JOIN ParticipantCTE N2
ON N2.participantId = P.participantId
INNER JOIN ParticipantCTE N1
ON N1.RowNumber1 = N2.RowNumber2
Order by N2.ThedeterministicGuid


A sample run with 100000 iterations showed that is will be evenly dispersed.

ParticipantId Counted
------------- -----------
1             33712
2             33226
3             33062


Comments (2)

  1. Adam Machanic says:

    Hi Jens,

    I had a conversation with Itzik Ben-Gan on this topic some time ago and he said that he had done some experimentation and discovered somewhat better randomization by doing CHECKSUM(NEWID()).  I don’t know the full details but given the source it’s certainly worth playing around with.