BinaryInsert Part2


Previously I discussed a potential missing API in List(Of T).BinaryInsert.  One of the items I mentioned was it had better performance because it was O(Log N) vs Insert and Sort which is O(NLogN).  Several users correctly pointed out this was incorrect and that Insert() had the additional overhead of an Array.Copy() which is O(N)ish.  But most agreed O(N) + O(LogN) was better than O(NLogN). 

Given that I already missed a key portion, I decided to write a test program to try out the various methods.  Caveat: I’m not a performance guy.  While I find performance intriguing and interesting it is by no means my specialty.  Any single performance test is unlikely to capture all real world scenarios.  However I did find the results a bit surprising.  At the bottom of the post is the test code I wrote. 

Here is the summary output. 

Force Jit
BinaryInsert:     00:00:00.0051167
Insert Then Sort: 00:00:00.0000251
Range (0-99)
BinaryInsert:     00:00:00.0000266
Insert Then Sort: 00:00:00.0000316
Random 10
BinaryInsert:     00:00:00.0000053
Insert Then Sort: 00:00:00.0000034
Random 100
BinaryInsert:     00:00:00.0000294
Insert Then Sort: 00:00:00.0000235
Random 1000
BinaryInsert:     00:00:00.0004917
Insert Then Sort: 00:00:00.0001526
Random 10000
BinaryInsert:     00:00:00.0261899
Insert Then Sort: 00:00:00.0018287
Random 100000
BinaryInsert:     00:00:02.4289054
Insert Then Sort: 00:00:00.0237019

As you can see, based on my sample program, BinaryInsert is much slower than Insert and Sort.  I ran the profiler against this and verified the suspicion that List(Of T).Insert() took the vast majority of the time. 

Perhaps there is a reason BinaryInsert is missing.

Module Module1

    Function BinaryInsert(Of T)(ByVal enumerable As IEnumerable(Of T), ByVal comp As IComparer(Of T)) As TimeSpan
        Dim list As New List(Of T)
        Dim watch As New Stopwatch()

        watch.Start()
        For Each value In enumerable
            list.BinaryInsert(value, comp)
        Next
        watch.Stop()
        Return watch.Elapsed
    End Function

    Function InsertAllThenSort(Of T)(ByVal enumerable As IEnumerable(Of T), ByVal comp As IComparer(Of T)) As TimeSpan
        Dim list As New List(Of T)
        Dim watch As New Stopwatch()

        watch.Start()
        For Each value In enumerable
            list.Add(value)
        Next
        list.Sort(comp)
        watch.Stop()
        Return watch.Elapsed
    End Function

    Sub TestBoth(Of T)(ByVal title As String, ByVal enumerable As IEnumerable(Of T))
        TestBoth(title, enumerable, Comparer(Of T).Default)
    End Sub

    Sub TestBoth(Of T)(ByVal title As String, ByVal enumerable As IEnumerable(Of T), ByVal comp As IComparer(Of T))
        Dim copy = New List(Of T)(enumerable)
        Dim ellapsedBinary = BinaryInsert(New List(Of T)(copy), comp)
        Dim ellapsedSort = InsertAllThenSort(New List(Of T)(copy), comp)

        Console.WriteLine(title)
        Console.WriteLine("BinaryInsert:     {0}", ellapsedBinary)
        Console.WriteLine("Insert Then Sort: {0}", ellapsedSort)
    End Sub

    Function Range(ByVal start As Integer, ByVal count As Integer) As List(Of Integer)
        Dim list = New List(Of Integer)
        For i = start To count - 1
            list.Add(i)
        Next
        Return list
    End Function

    Function Random(ByVal count As Integer) As List(Of Integer)
        Dim rand As New Random()
        Dim list = New List(Of Integer)
        For i = 0 To count - 1
            list.Add(rand.Next())
        Next
        Return list
    End Function

    Sub Main()
        TestBoth("Force Jit", New Integer() {1})
        TestBoth("Range (0-99)", Range(0, 100))
        TestBoth("Random 10", Random(10))
        TestBoth("Random 100", Random(100))
        TestBoth("Random 1000", Random(1000))
        TestBoth("Random 10000", Random(10000))
        TestBoth("Random 100000", Random(100000))
    End Sub

End Module

Comments (7)

  1. Mike says:

    Well, if you want to compare the performance of 2 pieces of code you need to try to make sure that they are equivalent otherwise you may end up comparing apples and oranges.

    In this particular case there is a subtle difference between InsertAllThenSort and BinaryInsert. BinaryInsert has an additional property that InsertAllThenSort does not have: it keeps the list sorted during adding.

    If you move list.Sort() inside the For Each loop you may find that BinaryInsert is faster, or maybe not :). I did not tried it.

  2. Jeff Moser says:

    Inserting at the end of the List is mostly O(1) since you’re just filling in an pre-allocated array slot. The exception is when you exceed the list Capacity and the list grows by a factor of 2 resulting in an O(N = 2 * Previous Capacity) copy.

    So, when you do insert and then sort, you’re doing a O(1) operation followed by code that does comparisons and memory swaps but no copies. The latter tends to be quicker than all the copies involved in the expected case of BinaryInsert (that is, you didn’t get lucky and couldn’t insert at the end).

    This can be seen more clearly if you update your test code to this:

    Sub TestBoth(Of T)(ByVal title As String, ByVal enumerable As IEnumerable(Of T))

           Dim copy = New List(Of T)(enumerable)

           Dim binaryInsertComparer As New CountingComparer(Of T)

           Dim elapsedBinary = BinaryInsert(New List(Of T)(copy), binaryInsertComparer)

           Dim insertAllThenSortComparer As New CountingComparer(Of T)

           Dim elapsedSort = InsertAllThenSort(New List(Of T)(copy), insertAllThenSortComparer)

           Console.WriteLine(title)

           Console.WriteLine("BinaryInsert:     {0} with {1:0,0} comparisons", elapsedBinary, binaryInsertComparer.TotalComparisons)

           Console.WriteLine("Insert Then Sort: {0} with {1:0,0} comparisons", elapsedSort, insertAllThenSortComparer.TotalComparisons)

           Dim compareDiff = insertAllThenSortComparer.TotalComparisons – binaryInsertComparer.TotalComparisons

           Dim minTicks = Convert.ToDouble(Math.Min(elapsedBinary.Ticks, elapsedSort.Ticks))

           Dim maxTicks = Convert.ToDouble(Math.Max(elapsedBinary.Ticks, elapsedSort.Ticks))

           Console.WriteLine("Insert then sort used {0:0,0} ({1:0%}) more comparisons, but about {2:0,0} less elements were copied, making it {3:0%} {4}", _

                            compareDiff, _

                            Convert.ToDouble(compareDiff) / binaryInsertComparer.TotalComparisons, _

                            (Convert.ToInt64(copy.Count) * (copy.Count + 1)) / 2, _

                            (maxTicks – minTicks) / minTicks, _

                            If(elapsedSort < elapsedBinary, "faster", "slower"))

       End Sub

       Function Random(ByVal count As Integer) As IEnumerable(Of Integer)

           Dim rand As New Random()

           Return Enumerable.Range(0, count).Select(Function(x) rand.Next())

       End Function

       Public Class CountingComparer(Of T)

           Implements IComparer(Of T)

           Private ReadOnly m_ActualComparer As IComparer(Of T)

           Private m_TotalComparisons As Integer

           Public Sub New()

               m_ActualComparer = Comparer(Of T).Default

           End Sub

           Public ReadOnly Property TotalComparisons() As Integer

               Get

                   Return m_TotalComparisons

               End Get

           End Property

           Public Function Compare(ByVal x As T, ByVal y As T) As Integer Implements System.Collections.Generic.IComparer(Of T).Compare

               m_TotalComparisons += 1

               Return m_ActualComparer.Compare(x, y)

           End Function

       End Class

       Sub Main()

           TestBoth("Force Jit", New Integer() {1})

           TestBoth("Range (0-99)", Enumerable.Range(0, 100))

           TestBoth("Random 10", Random(10))

           TestBoth("Random 100", Random(100))

           TestBoth("Random 1000", Random(1000))

           TestBoth("Random 10000", Random(10000))

           TestBoth("Random 100000", Random(100000))

       End Sub

    This gives:

    Force Jit

    Range (0-99)

    BinaryInsert:     00:00:00.0002988 with 573 comparisons

    Insert Then Sort: 00:00:00.0006341 with 768 comparisons

    Insert then sort used 195 (34%) more comparisons, but about 5,050 less elements were copied, making it 112% slower

    Random 10

    BinaryInsert:     00:00:00.0000073 with 24 comparisons

    Insert Then Sort: 00:00:00.0000054 with 52 comparisons

    Insert then sort used 28 (117%) more comparisons, but about 55 less elements wer e copied, making it 35% faster

    Random 100

    BinaryInsert:     00:00:00.0000310 with 531 comparisons

    Insert Then Sort: 00:00:00.0000315 with 1,010 comparisons

    Insert then sort used 479 (90%) more comparisons, but about 5,050 less elements were copied, making it 2% slower

    Random 1000

    BinaryInsert:     00:00:00.0004601 with 8,572 comparisons

    Insert Then Sort: 00:00:00.0003767 with 14,947 comparisons

    Insert then sort used 6,375 (74%) more comparisons, but about 500,500 less eleme nts were copied, making it 22% faster

    Random 10000

    BinaryInsert:     00:00:00.0171482 with 119,058 comparisons

    Insert Then Sort: 00:00:00.0046164 with 178,121 comparisons

    Insert then sort used 59,063 (50%) more comparisons, but about 50,005,000 less elements were copied, making it 271% faster

    Random 100000

    BinaryInsert:     00:00:01.8321850 with 1,522,690 comparisons

    Insert Then Sort: 00:00:00.0534849 with 2,162,251 comparisons

    Insert then sort used 639,561 (42%) more comparisons, but about 5,000,050,000 less elements were copied, making it 3326% faster

    The last line gives clues why going to 1000000 elements or more will take quite some time, but the overall idea is there.

    Kudos for profiling your code. This was an interesting aspect to explore.

    P.S. I updated your code to use the Enumerable.Range class that comes with LINQ.

  3. Mike,

    I agree with what you said about making sure you’re actually comparing the same values.  I should have been more clear here.  In this case I was considering the code for the following scenario.  

    There is a long lived search which returns chunks of data at a time in an un-ordered fashion.  The display for this data is continually updated and needs to be in a sorted fashion.  So I wanted to find the fastest way to insert and sort a set of values into a sorted list.  That’s why I designed this performance "test" in this way.  

  4. Jeff,

    Thanks for the detailed analysis there.  I love the idea of shimming in a Comparison class which will acurately track the total number of comparisons used.  

  5. Mike says:

    jaredpar,

    OK, so if I understand it correctly the list where you insert stuff lives longer than BinaryInsert/InsertAndSort methods and such an insert method is called with that list and a "chunk" of data.

    If my understanding is correct then the size of the chunk will make a difference. For example if the total number of values to insert is 10000 and the chunk size is 10 then BinaryInsert wins hands down (0.4ms vs 40ms). For a chunk size of 100 they’re equal and for a chunk size of 1000 InsertAndSort wins (0.08ms vs 0.4ms).

    Thanks for posting this, it’s fun to do this kind of stuff.

    Jeff,

    "does comparisons and memory swaps but no copies"

    Not be nitpicking, but what a memory swap is? Is there such a wonderful operation called memory swap that does not involve at least 2 copies? 🙂

    For the record: I "stolen" the quicksort implementation from Rotor and modified it to count the swaps. The result: about 30000 swaps for an array of 10000 Int32s. That means 60000 copies which means you can make at least 6 worst case BinaryInserts in the same time.

  6. Jeff Moser says:

    Mike:

    You’re right that a memory swap copies contents in a sense. What I was referring to was a "bulk copy" of everything in the array past the point of insertion.

    If you have the list [2, 3, 5, 7] and insert "4", you’d note that you’ve exceeded the capacity of 4 elements and then the list would grow to capacity * 2 = 8: (X denotes a reserved but unused slot)

    [2, 3, 5, 7, X, X, X, X]

    via an Array.Copy

    and then another bulk copy (Array.Copy) to give:

    [2, 3, 5, 5, 7, X, X, X]

    (Note the repeated 5)

    and then a "swap" / update to give

    [2, 3, 4, 5, 7, X, X, X]

    When I said "swap", I meant the array size isn’t touched and no bulk copies are needed.

    I didn’t realize that "Insert" is a bit inefficient in the design and thus does 2 bulk copies. So, the elements copied number is even worse than I reported.

    What is interesting is if you know you won’t repeat elements, you can use a SortedDictionary<T, Boolean>. This under the covers will use a Red Black tree implementation. The real interesting bit is that the red black tree is still slightly slower than the insert then sort approach. This is probably because it’s not taking advantage of the speedy L1 and L2 cache for contiguous chunks of memory that Array uses. In addition, the retrieval for an index then becomes O(log n) instead of O(1).

    It’s all about trade offs 🙂 The real question is how exactly will you be using it to determine what’s needed.

  7. Mike says:

    "I didn’t realize that "Insert" is a bit inefficient in the design and thus does 2 bulk copies. So, the elements copied number is even worse than I reported."

    Not really. It only does 2 bulk copies when it needs to expand the array but since expanding is done by doubling the size one of copies can be taken out the picture. After all List.Add() does the same bulk copy when it needs to expand the array.

    In the end is one bulk copy from an Insert without expand compared with the number of swaps/copies generated by quicksort which as I detailed above seems to be quite large.

    And indeed it’s all about trade offs. That’s why I find this kind of stuff to be quite fun.