The performance improvements of a lock-free algorithm is often not in the locking


GWO wonders what the conditions are under which the lock-free version significantly outpeforms a simple critical section.

Remember that switching to a lock-free algorithm should be guided by performance measurements. Switching from a simple algorithm to a complex one shouldn't be done unless you know that the simple algorithm is having trouble.

That said, here are some non-obvious advantages of a lock-free algorithm over one that uses a simple lock. (Later, we'll see how you can take advantage of these techniques without actually going lock-free.)

Consider a program that uses a simple critical section to perform something like the singleton constructor. Instead of a fancy lock-free algorithm, we use the much simpler version:

CRITICAL_SECTION g_csSingletonX;
X *g_px = NULL;

X *GetSingletonX()
{
    EnterCriticalSection(&g_csSingletonX);
    if (g_px == NULL)
    {
        g_px = new(nothrow) X();
    }
    LeaveCriticalSection(&g_csSingletonX);
    return g_px;
}

This simple code can run into trouble if the constructor function itself requires some locks, because now you have to impose a lock hierarchy in order to avoid a deadlock. (And this becomes impossible if the constructor function belongs to code outside your control.)

When working out what your lock hierarchy should be, you may discover that you need to consolidate some locks. This avoids the inversion problem, but it also reduces your lock granularity. You might decide to use a single lock to cover all singletons, and then you later discover that you also have to extend the lock that protects X's constructor to cover other operations on X.

CRITICAL_SECTION g_csCommon;

// (updated to remove double-check lock because that just raises
// more questions that distract from the point of the article)
X *GetSingletonX()
{
    EnterCriticalSection(&g_csCommon);
    if (g_px == NULL)
    {
        g_px = new(nothrow) X();
    }
    LeaveCriticalSection(&g_csCommon);
    return g_px;
}

Y *GetSingletonY()
{
    EnterCriticalSection(&g_csCommon);
    if (g_py == NULL)
    {
        g_py = new(nothrow) Y();
    }
    LeaveCriticalSection(&g_csCommon);
    return g_py;
}

void X::DoSomething()
{
    EnterCriticalSection(&g_csCommon);
    .. something ..
    LeaveCriticalSection(&g_csCommon);
}

Over time, your quiet little singleton lock has turned into a high-contention lock in your system.

One nice thing about a lock-free algorithm is that since there is no lock, it can't create inversion in a lock hierarchy. (Of course, you have to be careful not to use the interlocked operations to build a private lock, because that puts you back where you started.)

Another nice consequence of a lock-free algorithm is that, since there is no lock, you don't have to handle the WAIT_ABANDONED case. The data structure is never inconsistent; it passes atomically from one consistent state to another. Therefore, there's no need to write code to clean up leftover inconsistency. This came in handy in a case we looked at earlier, so that an application which crashes at an inopportune time will not corrupt the shared data and require a server reboot.

Comments (9)
  1. Anonymous says:

    A lock-free algorithm can make the multi-thread world simple, provided every thread plays the game in a unified way.

  2. Anonymous says:

    Wow — this is the exact same question I just asked on StackOverflow -> stackoverflow.com/…/are-lock-free-algorithms-really-more-performant-than-their-lock-full-counterparts

  3. Anonymous says:

    There are also cases where a lock-free algorithm performs slower than its locked counterpart, like a lock-free queue versus a locked std::deque.  The lock-free algorithm will scale better and so with enough contention will perform better, but it's definitely not for everybody and needs careful profiling.

    In fact the ConcurrentQueue in .NET is actually not lock-free, but borrows some of the techniques to get a better performing deque with better scalability than a locked queue, while not having any of lock-free's other properties.  Most apps don't actually require all the lock-free properties and just want something that performs well in the average case, so this isn't really a problem.

  4. Anonymous says:

    For the case of singleton you can almost always ensure the initialization happens before the first call to get and after the initialization can happen, with a little attention to detail and willingness to violate local concerns.

  5. Anonymous says:

    That singleton implementation looks dubious to me, g_xx can be published before the object is published fully and a concurrent call might end up using an object that isn't yet fully published. So that's another benefit of lock free algorithms, they avoid the difficult memory model issues. Or you will have to place a fence/memory barrier somewhere there to resolve the issue I think is there.

  6. @Niclas Lindgren: Critical sections (and some other functions too) use the appropriate barriers to ensure memory ordering, as can be seen here:

    msdn.microsoft.com/…/ms686355%28v=vs.85%29.aspx

  7. Anonymous says:

    @suryo: Yes, inside and after the region is safe, however the reading of g_pX before the region is not, there is nothing preventing the compiler and/or CPU to publish the pointer reference to g_pX before it has published the entire object it is pointing to. that means that reading the g_pX before the section can cause another thread to grab the pointer and start using an object that isn't fully published. There needs to be a fence between the publishing of the g_pX and the creation/publication of the object itself. I still think my concern is valid on the above written code.

    Also I somewhat made it sound like you don't need to worry about memory models for lock free algorithms, it is actually the very opposite, you need to pay close attention to them, but since you know you are doing lock free stuff you are very aware of them.

    If you use locking algorithms, don't try to be clever, it will almost always fail. If memory is shared any access to it should be protected if you need the access to return the correct data (same data a single threaded program would see), if you don't need correct values, for some reason, then you can relax such things.

    [I already covered this topic. The dangers of double-check locking wasn't the point of the article. I'll just delete the code. -Raymond]
  8. chrismcb says:

    Don't you have an issue with initializing the CriticalSection?

  9. Anonymous says:

    @ChrisMcB

    Once again, initialization of locks is obviously not the point of the article.

Comments are closed.