Is it okay to acquire an SRWLOCK recursively? (And why not?)


A customer was using an SRWLOCK to protect access to an object. All functions that use the object acquire the SRWLOCK in shared mode, and the function that destroys the object acquires the SRWLOCK in exclusive mode.

This seems correct to us, but we found that when running under the Application Verifier, we receive many errors complaining that our code is recursively acquiring the SRWLOCK. This happens, for example, when a function acquires the SRWLOCK in shared mode, and then calls another function which also acquires the SRWLOCK in shared mode. We completely understand why Application Verifier might warn of recursive exclusive lock acquisition, but why does it also complain about recursive shared lock acquisition?

Before we dig in and try to fix this, can you confirm that is a real problem? Or is this an oversight in Application Verifier?

Application Verifier is correct to complain. As noted in the documentation, SRWLOCK objects cannot be acquire recursively. This applies both to shared and exclusive acquisition.

The technical reason is that the SRWLOCK was designed to be fast and require no dynamic memory allocation. In order to accomplish this, many potential features had to be sacrificed, among them, recursive acquisition and lock promotion from shared to exclusive.

If you want a synchronization object that supports recursive acquisition, you might want to try a CRITICAL_SECTION, or build your own data structure around SRWLOCK that also keeps track of each thread's recursive acquisition count.

The customer replied,

Okay, so it's clear that we need to fix this. Our next question, then, is how urgent do we need to deploy this fix? Is this an actual broken scenario, or is it merely a theoretical possibility? In other words, do we need to issue a patch for it right now, or can we wait until our next major version?

Two of my colleagues shared their experiences:

We encountered this issue in our own product. The conclusion of the investigation was that this is a critical error if recursive acquisition is indeed occurring.

We hit a deadlock in production due to erroneous recursive acquisition. It is fiendishly difficult to debug. I would put it in the "immediate fix" category.

The customer thanks us for the information and began working on a fix.

Comments (23)
  1. Joshua says:

    Multithreading 102: RW locks can deadlock if you take them while holding them because a writer attempting to acquire blocks all farther readers.

    1. alegr1 says:

      Also Engineering 101: If the documentation tells you “don’t do that”, DON’T DO THAT!

      1. poizan42 says:

        Reality 101: No one reads the documentation

        1. Nico says:

          Reality 201: Nobody writes the documentation

          1. Ian says:

            Reality 404: Couldn’t find the documentation. :-)

        2. McBucket says:

          Reality 201: Reality bites (those who don’t read the documentation).

  2. Roman says:

    > All functions that use the object acquire the SRWLOCK in shared mode, and the function that destroys the object acquires the SRWLOCK in exclusive mode.

    This sounds broken already! If the destructor needs to acquire the lock, then that implies that another thread might attempt to use the object while it’s being destroyed.

    1. Henri Hein says:

      Yes, exactly. I’ve worked on project code where synchronization objects appeared on the stack. Unless it’s some top-level function, it gives me a strong sense the coder don’t really grok how synchronization works.

    2. Karellen says:

      What makes you think that “the function that destroys the object” is “the destructor”. I read that as meaning that “the function that destroys the object” is the one that calls “delete” (or equivalent) on the object.

      Given that there is no complement to InitializeSRWLock() it should be safe to grab the lock exclusively, then delete the object. Once you have the lock, no other functions can use the object.

      If another function has a reference to the object after you’ve grabbed the exclusive lock, and tries to get a shared lock, it will block indefinitely because the lock will never be released – but at that point your ownership semantics have gone screwy anyway. If you’re deleting an object that other threads still have references to that they think are live, something else has already gone wrong. The type of lock you use won’t save you.

      1. Henri Hein says:

        > but at that point your ownership semantics have gone screwy anyway

        That’s precisely the point. By “destroying” the SRWLOCK, I assume they mean freeing or deleting the memory it points to. Why would it need to acquire the lock before doing that? Nothing else should be referencing that lock at that time.

        1. Joshua says:

          I don’t see why you’d acquire it, but if you happen to be already holding it at the time of destroy object, there’s no reason to release it either.

          The operation went:

          Acquire parent
          Acquire object
          Do something so that object could no longer be found from parent
          Release parent
          If we are the last holder of a reference to the object
          Destroy object

          Where Acquire and Release kept a visible counter so “If we are the last” was trivial.

          1. Henri Hein says:

            I’m honestly not sure what you mean by “parent” and “object.” The issue is recursively acquiring a slim lock. That indicates only one lock was involved. At least in the issue at hand.

            From the description: “the function that destroys the object acquires the SRWLOCK in exclusive mode.” From a purely technical reading of this, maybe acquiring the lock and destroying it are disconnected in time, but the sentence strongly implies the object is acquired just before destroying it, as though that was necessary. If they need to acquire the lock before they destroy it, some of their other threads are going to be surprised.

          2. Joshua says:

            It’s merely that in this case the path that would destroy a slim lock happens to be already holding the lock, and furthermore there is no need to release the lock prior to destroying it.

          3. smf says:

            >I’m honestly not sure what you mean by “parent” and “object.” The issue is recursively acquiring a slim lock.
            > That indicates only one lock was involved. At least in the issue at hand.

            parent is the class that has the lock on it, object is the one that is being allocated and freed.

            >This sounds broken already! If the destructor needs to acquire the lock, then that implies that another thread might attempt to use the object while it’s being destroyed.

            I’m not sure it is the destructor in this case, but a “function that destroys the object”. However if you have an object that spawns a thread and then later something calls delete on the object, then you have exactly that situation. Waiting until the object isn’t being used before it gets free’d is probably the only reason they added the lock in the first place.

        2. They might mean that they’re destroying the object protected by the lock, but not necessarily destroying the lock itself. (For example, the lock might later be reused by a new iteration of the same object.)

    3. To me it sounds like an example of ‘self synchronized destruction’. Consider an object where the shared lock count is like the ref count on a ref counted object: if somebody posesses a reference, then it must be >0. This means that the point at which you can acquire exclusive access is also the point at which nobody else is holding a handle to the object.

      Such use cases are odd, but not unprecedented.

  3. Henri Hein says:

    There is a sample here that demonstrates shared locks with the C++ 11 STL synchronization objects: https://code.msdn.microsoft.com/Multiple-Readers-One-34bb3ac1

    No doubt SRWLOCK is leaner than the STL ones, but if you need cross-platform, the STL ones are preferred.

  4. I presume this would have been easy to fix. Just replace SRWLOCK with your own gizmo that uses SRWLOCK to do the cross-thread locking, but with a per-thread usage counter to handle the recursive case. Or am I missing something?

    1. dave says:

      >Am I missing something?

      At that point your slim lock has gained a lot of weight, and you might as well use one of the existing locks that support recursive acquisition.

    2. Where do you plan on putting the per-thread usage count? TLS? Or are you willing to raise an out-of-memory exception from AcquireShared?

  5. This design is fundamentally broken.

    Proper locking primitive here is akin to rundown references used available for Windows drivers. On a high level, if you have a ptr to object, you use the object as much as you want. If you want to make ptr escape your scope, you acquire a rundown ref — which can fail, which means the object is getting destroyed. The destroyer alters the state of rundown structure so further acquires fail, and waits for refcount of valid pointer holders to drain to zero.

  6. Ivan K says:

    I remember using pthread_mutex or something pthread of the sort for the first time and encountering it’s non-recusive nature. The specific OS had “_r” vetsions that would work the same as windows critical sections, but it was easy to change things to work. (private method with a precondition that the lock is held before calling.)

Comments are closed.

Skip to main content