Previously I discussed a potential missing API in List(Of T).BinaryInsert. One of the items I mentioned was it had better performance because it was O(Log N) vs Insert and Sort which is O(NLogN). Several users correctly pointed out this was incorrect and that Insert() had the additional overhead of an Array.Copy() which is O(N)ish. But most agreed O(N) + O(LogN) was better than O(NLogN).

Given that I already missed a key portion, I decided to write a test program to try out the various methods. Caveat: I'm not a performance guy. While I find performance intriguing and interesting it is by no means my specialty. Any single performance test is unlikely to capture all real world scenarios. However I did find the results a bit surprising. At the bottom of the post is the test code I wrote.

Here is the summary output.

Force Jit

BinaryInsert: 00:00:00.0051167

Insert Then Sort: 00:00:00.0000251

Range (0-99)

BinaryInsert: 00:00:00.0000266

Insert Then Sort: 00:00:00.0000316

Random 10

BinaryInsert: 00:00:00.0000053

Insert Then Sort: 00:00:00.0000034

Random 100

BinaryInsert: 00:00:00.0000294

Insert Then Sort: 00:00:00.0000235

Random 1000

BinaryInsert: 00:00:00.0004917

Insert Then Sort: 00:00:00.0001526

Random 10000

BinaryInsert: 00:00:00.0261899

Insert Then Sort: 00:00:00.0018287

Random 100000

BinaryInsert: 00:00:02.4289054

Insert Then Sort: 00:00:00.0237019

As you can see, based on my sample program, BinaryInsert is much slower than Insert and Sort. I ran the profiler against this and verified the suspicion that List(Of T).Insert() took the vast majority of the time.

Perhaps there is a reason BinaryInsert is missing.

Module Module1 Function BinaryInsert(Of T)(ByVal enumerable As IEnumerable(Of T), ByVal comp As IComparer(Of T)) As TimeSpan Dim list As New List(Of T) Dim watch As New Stopwatch() watch.Start() For Each value In enumerable list.BinaryInsert(value, comp) Next watch.Stop() Return watch.Elapsed End Function Function InsertAllThenSort(Of T)(ByVal enumerable As IEnumerable(Of T), ByVal comp As IComparer(Of T)) As TimeSpan Dim list As New List(Of T) Dim watch As New Stopwatch() watch.Start() For Each value In enumerable list.Add(value) Next list.Sort(comp) watch.Stop() Return watch.Elapsed End Function Sub TestBoth(Of T)(ByVal title As String, ByVal enumerable As IEnumerable(Of T)) TestBoth(title, enumerable, Comparer(Of T).Default) End Sub Sub TestBoth(Of T)(ByVal title As String, ByVal enumerable As IEnumerable(Of T), ByVal comp As IComparer(Of T)) Dim copy = New List(Of T)(enumerable) Dim ellapsedBinary = BinaryInsert(New List(Of T)(copy), comp) Dim ellapsedSort = InsertAllThenSort(New List(Of T)(copy), comp) Console.WriteLine(title) Console.WriteLine("BinaryInsert: {0}", ellapsedBinary) Console.WriteLine("Insert Then Sort: {0}", ellapsedSort) End Sub Function Range(ByVal start As Integer, ByVal count As Integer) As List(Of Integer) Dim list = New List(Of Integer) For i = start To count - 1 list.Add(i) Next Return list End Function Function Random(ByVal count As Integer) As List(Of Integer) Dim rand As New Random() Dim list = New List(Of Integer) For i = 0 To count - 1 list.Add(rand.Next()) Next Return list End Function Sub Main() TestBoth("Force Jit", New Integer() {1}) TestBoth("Range (0-99)", Range(0, 100)) TestBoth("Random 10", Random(10)) TestBoth("Random 100", Random(100)) TestBoth("Random 1000", Random(1000)) TestBoth("Random 10000", Random(10000)) TestBoth("Random 100000", Random(100000)) End Sub End Module

Well, if you want to compare the performance of 2 pieces of code you need to try to make sure that they are equivalent otherwise you may end up comparing apples and oranges.

In this particular case there is a subtle difference between InsertAllThenSort and BinaryInsert. BinaryInsert has an additional property that InsertAllThenSort does not have: it keeps the list sorted during adding.

If you move list.Sort() inside the For Each loop you may find that BinaryInsert is faster, or maybe not :). I did not tried it.

Inserting at the end of the List is mostly O(1) since you’re just filling in an pre-allocated array slot. The exception is when you exceed the list Capacity and the list grows by a factor of 2 resulting in an O(N = 2 * Previous Capacity) copy.

So, when you do insert and then sort, you’re doing a O(1) operation followed by code that does comparisons and memory swaps but no copies. The latter tends to be quicker than all the copies involved in the expected case of BinaryInsert (that is, you didn’t get lucky and couldn’t insert at the end).

This can be seen more clearly if you update your test code to this:

Sub TestBoth(Of T)(ByVal title As String, ByVal enumerable As IEnumerable(Of T))

Dim copy = New List(Of T)(enumerable)

Dim binaryInsertComparer As New CountingComparer(Of T)

Dim elapsedBinary = BinaryInsert(New List(Of T)(copy), binaryInsertComparer)

Dim insertAllThenSortComparer As New CountingComparer(Of T)

Dim elapsedSort = InsertAllThenSort(New List(Of T)(copy), insertAllThenSortComparer)

Console.WriteLine(title)

Console.WriteLine("BinaryInsert: {0} with {1:0,0} comparisons", elapsedBinary, binaryInsertComparer.TotalComparisons)

Console.WriteLine("Insert Then Sort: {0} with {1:0,0} comparisons", elapsedSort, insertAllThenSortComparer.TotalComparisons)

Dim compareDiff = insertAllThenSortComparer.TotalComparisons – binaryInsertComparer.TotalComparisons

Dim minTicks = Convert.ToDouble(Math.Min(elapsedBinary.Ticks, elapsedSort.Ticks))

Dim maxTicks = Convert.ToDouble(Math.Max(elapsedBinary.Ticks, elapsedSort.Ticks))

Console.WriteLine("Insert then sort used {0:0,0} ({1:0%}) more comparisons, but about {2:0,0} less elements were copied, making it {3:0%} {4}", _

compareDiff, _

Convert.ToDouble(compareDiff) / binaryInsertComparer.TotalComparisons, _

(Convert.ToInt64(copy.Count) * (copy.Count + 1)) / 2, _

(maxTicks – minTicks) / minTicks, _

If(elapsedSort < elapsedBinary, "faster", "slower"))

End Sub

Function Random(ByVal count As Integer) As IEnumerable(Of Integer)

Dim rand As New Random()

Return Enumerable.Range(0, count).Select(Function(x) rand.Next())

End Function

Public Class CountingComparer(Of T)

Implements IComparer(Of T)

Private ReadOnly m_ActualComparer As IComparer(Of T)

Private m_TotalComparisons As Integer

Public Sub New()

m_ActualComparer = Comparer(Of T).Default

End Sub

Public ReadOnly Property TotalComparisons() As Integer

Get

Return m_TotalComparisons

End Get

End Property

Public Function Compare(ByVal x As T, ByVal y As T) As Integer Implements System.Collections.Generic.IComparer(Of T).Compare

m_TotalComparisons += 1

Return m_ActualComparer.Compare(x, y)

End Function

End Class

Sub Main()

TestBoth("Force Jit", New Integer() {1})

TestBoth("Range (0-99)", Enumerable.Range(0, 100))

TestBoth("Random 10", Random(10))

TestBoth("Random 100", Random(100))

TestBoth("Random 1000", Random(1000))

TestBoth("Random 10000", Random(10000))

TestBoth("Random 100000", Random(100000))

End Sub

This gives:

Force Jit

Range (0-99)

BinaryInsert: 00:00:00.0002988 with 573 comparisons

Insert Then Sort: 00:00:00.0006341 with 768 comparisons

Insert then sort used 195 (34%) more comparisons, but about 5,050 less elements were copied, making it 112% slower

Random 10

BinaryInsert: 00:00:00.0000073 with 24 comparisons

Insert Then Sort: 00:00:00.0000054 with 52 comparisons

Insert then sort used 28 (117%) more comparisons, but about 55 less elements wer e copied, making it 35% faster

Random 100

BinaryInsert: 00:00:00.0000310 with 531 comparisons

Insert Then Sort: 00:00:00.0000315 with 1,010 comparisons

Insert then sort used 479 (90%) more comparisons, but about 5,050 less elements were copied, making it 2% slower

Random 1000

BinaryInsert: 00:00:00.0004601 with 8,572 comparisons

Insert Then Sort: 00:00:00.0003767 with 14,947 comparisons

Insert then sort used 6,375 (74%) more comparisons, but about 500,500 less eleme nts were copied, making it 22% faster

Random 10000

BinaryInsert: 00:00:00.0171482 with 119,058 comparisons

Insert Then Sort: 00:00:00.0046164 with 178,121 comparisons

Insert then sort used 59,063 (50%) more comparisons, but about 50,005,000 less elements were copied, making it 271% faster

Random 100000

BinaryInsert: 00:00:01.8321850 with 1,522,690 comparisons

Insert Then Sort: 00:00:00.0534849 with 2,162,251 comparisons

Insert then sort used 639,561 (42%) more comparisons, but about 5,000,050,000 less elements were copied, making it 3326% faster

The last line gives clues why going to 1000000 elements or more will take quite some time, but the overall idea is there.

Kudos for profiling your code. This was an interesting aspect to explore.

P.S. I updated your code to use the Enumerable.Range class that comes with LINQ.

Mike,

I agree with what you said about making sure you’re actually comparing the same values. I should have been more clear here. In this case I was considering the code for the following scenario.

There is a long lived search which returns chunks of data at a time in an un-ordered fashion. The display for this data is continually updated and needs to be in a sorted fashion. So I wanted to find the fastest way to insert and sort a set of values into a sorted list. That’s why I designed this performance "test" in this way.

Jeff,

Thanks for the detailed analysis there. I love the idea of shimming in a Comparison class which will acurately track the total number of comparisons used.

jaredpar,

OK, so if I understand it correctly the list where you insert stuff lives longer than BinaryInsert/InsertAndSort methods and such an insert method is called with that list and a "chunk" of data.

If my understanding is correct then the size of the chunk will make a difference. For example if the total number of values to insert is 10000 and the chunk size is 10 then BinaryInsert wins hands down (0.4ms vs 40ms). For a chunk size of 100 they’re equal and for a chunk size of 1000 InsertAndSort wins (0.08ms vs 0.4ms).

Thanks for posting this, it’s fun to do this kind of stuff.

Jeff,

"does comparisons and memory swaps but no copies"

Not be nitpicking, but what a memory swap is? Is there such a wonderful operation called memory swap that does not involve at least 2 copies? ðŸ™‚

For the record: I "stolen" the quicksort implementation from Rotor and modified it to count the swaps. The result: about 30000 swaps for an array of 10000 Int32s. That means 60000 copies which means you can make at least 6 worst case BinaryInserts in the same time.

Mike:

You’re right that a memory swap copies contents in a sense. What I was referring to was a "bulk copy" of everything in the array past the point of insertion.

If you have the list [2, 3, 5, 7] and insert "4", you’d note that you’ve exceeded the capacity of 4 elements and then the list would grow to capacity * 2 = 8: (X denotes a reserved but unused slot)

[2, 3, 5, 7, X, X, X, X]

via an Array.Copy

and then another bulk copy (Array.Copy) to give:

[2, 3, 5, 5, 7, X, X, X]

(Note the repeated 5)

and then a "swap" / update to give

[2, 3, 4, 5, 7, X, X, X]

When I said "swap", I meant the array size isn’t touched and no bulk copies are needed.

I didn’t realize that "Insert" is a bit inefficient in the design and thus does 2 bulk copies. So, the elements copied number is even worse than I reported.

What is interesting is if you know you won’t repeat elements, you can use a SortedDictionary<T, Boolean>. This under the covers will use a Red Black tree implementation. The real interesting bit is that the red black tree is still slightly slower than the insert then sort approach. This is probably because it’s not taking advantage of the speedy L1 and L2 cache for contiguous chunks of memory that Array uses. In addition, the retrieval for an index then becomes O(log n) instead of O(1).

It’s all about trade offs ðŸ™‚ The real question is how exactly will you be using it to determine what’s needed.

"I didn’t realize that "Insert" is a bit inefficient in the design and thus does 2 bulk copies. So, the elements copied number is even worse than I reported."

Not really. It only does 2 bulk copies when it needs to expand the array but since expanding is done by doubling the size one of copies can be taken out the picture. After all List.Add() does the same bulk copy when it needs to expand the array.

In the end is one bulk copy from an Insert without expand compared with the number of swaps/copies generated by quicksort which as I detailed above seems to be quite large.

And indeed it’s all about trade offs. That’s why I find this kind of stuff to be quite fun.