Riffing on Raymond – Splay Trees

Raymond's post today on splay trees (brief summary: splay trees are interesting, but when you do an in-order traversal, they degrade to a linked list) reminded me of some "fun" I had with the NT 3.1 browser.

The browser service is mostly dead these days, but in the early 1990s, the idea of finding the names of all the computers in your workgroup was flat-out cool.  Starting with Lan Manager, there was a mechanism built into the system to allow enumerating all the servers in your domain.  Windows for Workgroups expanded on that capability to allow for browsing not only the servers in your workgroup but also the workgroups on your network.  All of it was very cool.

The browser architecture was pretty straightforward.  There was a client piece and a server piece.  In the browser architecture, only certain machines functioned as active browser servers, these were the machines that collected server announcements from the network.  Clients talked to the browser servers, and retrieved the list of servers from that service.

When I implemented the browser for NT 3.1, it became very clear that the list of "servers" was going to be quite large, since every NT 3.1 machine was also a server.  There could literally be thousands and thousands of computers in a given domain, so it was important to store the list of servers in the most efficient fashion as possible.

Since the NT filesystem team had built a robust splay tree package, and since the most common operation performed on the server was processing announcements from servers, it made perfect sense to use this splay tree package - after all, announcements from machines come in random order, so the tree structure should be pretty robust.

One of the complications of the browser was that the browser had to support multiple network adapters and multiple network protocols.  A typical NT 3.1 client machine had two or three protocols running (typically TCP/IP and one of NetBEUI, IPX/SPX or XNS), the browser client had to retrieve the list of servers from the browser server for each network and merge the lists.  Since I had experience with the splay tree package, I figured it made sense to simply re-use the splay tree for the client.  To merge the lists, all I had to do was to retrieve the lists from each network and insert the entries into the splay tree that would eventually be returned to the user.

The merge process was essentially:

    while (node = SplayTreeGetNextInorderNode(TransportSpecificSplayTree))
        if (SplayTreeLookupNode(MergedSplayTree, node) == NULL)
            SplayTreeInsertNode(MergedSplayTree, node);

It sounded great on paper, and worked even better.  I was a totally happy camper...


Until some point later on in the process, as we were digging deep into the browser's performance.  We were running nightly analysis runs to ensure that the browser was functioning correctly and we noticed that sometimes retrieving the list of servers would take a LONG time - sometimes several seconds, in fact, which was totally unacceptable.

I started digging into the problem, and identified a number of problems, but I was quite surprised to realize that a significant part of the bottleneck was actually on the CLIENT.  I was really mystified about this, until I really started digging into the problem.  It turns out that the client would retrieve the list of servers and insert it into the splay tree.  Since the list of servers was maintained in a splay tree on the server, and retrieved by doing an in-order traversal of the tree, the list coming from the server was sorted alphabetically.  When I inserted all the entries in the tree, it formed a nice balanced tree (one of the characteristics of a splay tree is that it's resilient against in-order insertions).  The problem occurred when I tried to merge the list from other networks.

If you've been paying attention you see the problem.  Since the transport specific splay tree is in-order, if the server represented by "node" is already on the list, this functions as an in-order traversal of the linked list.  If the lists are disjoint, this isn't a critical problem since we'll be doing a lot of insertions, which (as I mentioned) results in a balanced tree.

But if the lists of servers are identical (or almost identical), then each insertion degrades the tree to a linear list.  That means that the nice balanced splay tree is a linked list, and the insertion algorithm listed above degrades into a O(n3) traversal.  It really wasn't pretty.  And, of course, on the Microsoft corporate network we had thousands of computers in the same domain with the exact same set of network transports.  All of a sudden my clever insertion algorithm starts looking like the root cause of a lot of performance issues.

The good news is that since all the transport specific lists were sorted in-order, I realized I could rely on this behavior and inserted a O(n) insertion sort, which sped up the client-side processing by an order of magnitude.

The moral of the story: Understand the performance characteristics of your underlying data structures BEFORE you implement them, or you're going to be sorry.


A side-note to readers unfamiliar with "O(<something>)": The O(n) is what's known as "Big-O" notation, it's a representation of the order of magnitude of an algorithm.

Comments (10)

  1. ben says:

    Isn’t it just as tenuous to rely on the fact that the lists were sorted in-order? Seems like relying on undocumented features, unless the sorting of lists WAS documented 🙂

  2. Dan McCarty says:

    Three cheers for insertion sorts on already mostly-sorted data!

  3. Ben, it wasn’t totally insertion sort. First I did a scan over the data to ensure it was sorted, and if it wasn’t, I sorted it. Since the data was almost always sorted, it wasn’t a big deal.

  4. ben says:

    Larry, I think you misdirected your last comment, I didnt mention insertion sort 🙂

  5. I’m confused… how could you use insertion sort to combine two sorted lists? Wouldn’t that be merge sort?

    // a[] and b[] must be sorted

    node[] merge(node a[], node b[])


    __ node[] m = new node[a.length + b.length];

    __ int i = 0; int j = 0; int k = 0;

    __ while (k < m.length)

    __ {

    _____ if (i < a.length && a[i] <= b[j])

    _____ {

    ________ m[k++] = a[i++];

    _____ } else

    _____ {

    ________ m[k++] = b[j++];

    __ }

    __ return m;


  6. Inside of while loop should be:

    if (i == a.length) { m[k++] = b[j++]; }

    else if (j == b.length { m[k++] = a[i++]; }

    else if (a[i] <= b[j]) { m[k++] = a[i++]; }

    else /* a[i] > b[j] */ { m[k++] = b[j++]; }

    As I had it, b[] can overflow…

  7. Maurits, you’re right, it was an merge sort.

    The critical thing was it was a O(n+m) algorithm where n and m are the lengths of the lists being sorted. Because both lists were sorted, I could be really clever about how I scanned the two lists.

Skip to main content