Linked List Performance Issues? Skip List to the Rescue!

A Traditional Linked List

At some point in every developer’s life, you’ll have to write some code that works with a Linked List. You’ll start by making a structure like this:

typedef struct
{
ULONG64 ulStartAddress;
ULONG64 ulEndAddress;
MemoryNode* pNext;
} MemoryNode;

And you’ll populate it… (In this example, I’m using the DbgEng API to iterate over a target address space for memory regions.)

HRESULT LoopMemInfo(IDebugDataSpaces2* pDebugDataSpaces2, MemoryNode** ppNodeRoot)
{
if ((pDebugDataSpaces2 == NULL) || (ppNodeRoot == NULL))
return E_INVALIDARG;

*ppNodeRoot = NULL;

ULONG64 ulPtr = 0;
MEMORY_BASIC_INFORMATION64 memInfo;
  MemoryNode* pPrev = NULL;
while (SUCCEEDED(pDebugDataSpaces2->QueryVirtual(ulPtr, &memInfo)))
{
MemoryNode* pNode = new MemoryNode;
if (*ppNodeRoot == NULL)
{
*ppNodeRoot = pNode;
}
else
{
pPrev->pNext = pNode;
}
pPrev = pNode;
pNode->ulStartAddress = memInfo.BaseAddress;
pNode->ulEndAddress = memInfo.BaseAddress + memInfo.RegionSize;
pNode->pNext = NULL;
ulPtr = memInfo.BaseAddress + memInfo.RegionSize;
}
return S_OK;
}

After that, you’ll want to interact with the linker list, usually in a search type fashion.

HRESULT Search(MemoryNode* pNodeRoot, MemoryNode** ppNode, ULONG64 addr)
{
if ((pNodeRoot == NULL) || (ppNode == NULL))
return E_INVALIDARG;

*ppNode = NULL;

ULONG64 ulPtr = 0;
MEMORY_BASIC_INFORMATION64 memInfo;
  MemoryNode* pNode = pNodeRoot;
while (pNode != NULL)
{
if ((addr >= pNode->ulStartAddress) && (addr <= pNode->ulEndAddress))
{
*ppNode = pNode;
return S_OK;
}

if (addr < pNode->ulStartAddress)
{ // We have passed the point where it can be as the list is in order
break;
}
pNode = pNode->pNext;
}
return S_FALSE;
}

The problem is, this code can run quite slow. For example, lets say that we modeled 200,000 memory regions.  (As you can probably guess, yes, this is based on a personal real-world observation). And lets say that you wanted to mark each a region that contains a particular address as being ‘used’. In a worse case scenario, you would need to traverse 200,000 pNext pointers to find out the region doesn’t exist (e.g. you search for 0xffffffff’ffffffff).  That’s really expensive if you need to do the search 100,000+ times (e.g. for every pointer in a stack – 1Mb/8).  That’s a total of 200,000,000,000 (200 trillion) comparison operations!

The interesting thing about this linked list is that it is ordered by virtue of the way it was built, and it will not experience any type of insertion or deletion after its creation. If you have a Linked List like this, a Skip List can be use to help speed up searches considerably!

Skip List

A Skip List works the same way as the various tokens work in the board game Scotland Yard. You jump a great distance with the Underground (train) tokens, a few streets with the Bus tokens, and then individual streets with the Taxi tokens. By keeping additional ‘Next’ pointers to varying distances forward in the list, you can ‘skip’ past a huge number of nodes with just a few jumps.

The trick is to work out the number of skip pointers to have versus the cost of checking each skip pointer in each destination reached. If you take this to the n’th degree, you could have 200,000 skip pointers per node (as per my example), but it it would cost 199,999 ‘if-then’ statements to determine that you only need to skip forward one node (in the worse case) – the last statement being the only ‘if-then’ statement that succeeds.

Some basic math generates a pretty good idea of the size of the jump you’ll get by having N skip pointers (there is no real mathematical theorem here behind what the size of the jumps should be – I’m just using even size jumps as that seems a good way of making an ‘average’ approach). What you do is, get the expected data set size (in my case, somewhere between 10,000 and 1,000,000 nodes) and calculate the square, cube and quad roots of the highest value.

1,000,000 Square Root = 1,000x1,000 | Worse Case = ~ 1,000+1,000 = ~2,000
1,000,000 Cube Root = 100x100x100 | Worse Case = ~ 100+100+100 = ~300
1,000,000 Quad Root = 32x32x32x32 | Worse Case = ~ 32+32+32+32 = ~128

The approximate worse-case cost is the sum of the jump sizes. So if you need to traverse 1,000,000 nodes to the last node, it would take 1million jumps if you have 1 skip (next) pointer, ~2,000 jumps if you have 2 skip pointers (0.2% cost), ~300 if you have 3 skip pointers (0.03% cost), and ~128 if you have 4 skips pointers (0.01% cost).  At a certain point, the increase in the number of skip pointers passes the benefit of adding them. In the case of 1,000,000 the magic number is around 9 (I think), but the benefits are seen around 3 or 4.

Now my math isn’t very accurate as it hasn’t accounted for the conditional statements that fail along the way. That is, when you get to the last big jump, you check for this jump in all subsequent sub-nodes. Thus the cost is more like this (I think):

1,000,000 Square Root = 1,000x1,000 | Worse Case = (1,000)+(1,000+999) = 2,999
1,000,000 Cube Root = 100x100x100 | Worse Case = (100)+(100+99)+(100+99+99) = 597
1,000,000 Quad Root = 32x32x32x32 | Worse Case = (32)+(32+31)+(32+31+31)+(32+31+31+31) = 314
1,000,000 Quin Root = 16x16x16x16x16 | Worse Case = (16)+(16+15)+(16+15+15)+(16+15+15+15)+(16+15+15+15+15) = 230

The savings are still huge. And from experience, I can tell you that this reduced my application’s search from 20 minutes to under 100 msec. Compared to B-Tree storage, you’ll get a faster search, plus you don’t have to worry about rebalancing the tree to solve the right-tree only insertion issue (which you’ll get with linear data insertion).

So what does the code now look like?

Here’s the modified structure using 4 jump pointers:

typedef struct
{
ULONG64 ulStartAddress;
ULONG64 ulEndAddress;
Node* pNext4; // Skips 32k forward
Node* pNext3; // Skips 1k forward
Node* pNext2; // Skips 32 forward
Node* pNext1; // Skips 1 forward
} MemoryNode;

Here’s the modified list builder using 4 jump pointers:

HRESULT LoopMemInfo(IDebugDataSpaces2* pDebugDataSpaces2, MemoryNode** ppNodeRoot)
{
if ((pDebugDataSpaces2 == NULL) || (ppNodeRoot == NULL))
return E_INVALIDARG;

*ppNodeRoot = NULL;

ULONG64 ulPtr = 0;
MEMORY_BASIC_INFORMATION64 memInfo;
  MemoryNode* pPrev4 = NULL;
MemoryNode* pPrev3 = NULL;
MemoryNode* pPrev2 = NULL;
MemoryNode* pPrev1 = NULL;
ULONG64 ulCount = 0;
while (SUCCEEDED(pDebugDataSpaces2->QueryVirtual(ulPtr, &memInfo)))
{
MemoryNode* pNode = new MemoryNode;
if (ulCount == 0)
{
*ppNodeRoot = pNode;
pPrev1 = pNode;
pPrev2 = pNode;
pPrev3 = pNode;
pPrev4 = pNode;
}

if (ulCount >= 1) { pPrev1->pNext1 = pNode; pPrev1 = pPrev1->pNext1; }
if (ulCount >= 32) { pPrev2->pNext2 = pNode; pPrev2 = pPrev2->pNext1; }
if (ulCount >= 1024) { pPrev3->pNext3 = pNode; pPrev3 = pPrev3->pNext1; }
if (ulCount >= 32768) { pPrev4->pNext4 = pNode; pPrev4 = pPrev4->pNext1; }

pNode->ulStartAddress = memInfo.BaseAddress;
pNode->ulEndAddress = memInfo.BaseAddress + memInfo.RegionSize;
pNode->pNext4 = NULL;
pNode->pNext3 = NULL;
pNode->pNext2 = NULL;
pNode->pNext1 = NULL;
ulPtr = memInfo.BaseAddress + memInfo.RegionSize;
ulCount++;
}
return S_OK;
}

Here’s the modified search using 4 jump pointers:

HRESULT Search(MemoryNode* pNodeRoot, MemoryNode** ppNode, ULONG64 addr)
{
if ((pNodeRoot == NULL) || (ppNode == NULL))
return E_INVALIDARG;

*ppNode = NULL;

MEMORY_BASIC_INFORMATION64 memInfo;
  MemoryNode* pNode = pNodeRoot;
while (pNode != NULL)
{
if ((pNode->pNext4 != NULL) && (addr >= pNode->pNext4->ulStartAddress))
{
pNode = pNode->pNext4;
continue;
}

if ((pNode->pNext3 != NULL) && (addr >= pNode->pNext3->ulStartAddress))
{
pNode = pNode->pNext3;
continue;
}

if ((pNode->pNext2 != NULL) && (addr >= pNode->pNext2->ulStartAddress))
{
pNode = pNode->pNext2;
continue;
}

if ((pNode->pNext1 != NULL) && (addr >= pNode->pNext1->ulStartAddress))
{
pNode = pNode->pNext1;
continue;
}

if ((addr >= pNode->ulStartAddress) && (addr <= pNode->ulEndAddress))
{
*ppNode = pNode;
return S_OK;
}

// We haven’t found it based on our knowledge of the list order
break;
}
return S_FALSE;
}

* Note: I just typed this code off the top of my head; I’ve never compiled or tested it. It could have bugs in it. No warranty of any type is given for its use.

So there you have it, an example implementation of a Skip List.  Note though, there are lots of caveats to its use:

  • The data being modeled needs to linear
  • The data being modeled needs to comparable
  • Insertion is done all-at-once (so the cost of generating the pointers is low)
  • An unconstrained amount of memory is available (you can afford the cost of the additional pointers)

Skips Lists are a simple concept, and aren’t too hard implement.  I’d love to hear about your success stories.  They fixed a massive performance problem within my own code without causing too much disruption to the applications linked-list design.