Lock-free algorithms: Update if you can I’m feeling down


A customer was looking for advice on this synchronization problem:

We have a small amount of data that we need to share among multiple processes. One way to protect the data is to use a spin lock. However, that has potential for deadlock if the process which holds the spinlock doesn't get a chance to release it. For example, it might be suspended in the debugger, or somebody might decide to use Terminate­Process to nuke it.

Any suggestions on how we can share this data without exposure to these types of horrible failure modes? I'm thinking of something like a reader takes the lock, fetches the values, and then checks at status at the end of tell if the data is valid. Meanwhile, a writer tries to take the lock with a timeout, and if the timeout fires, then the writer just goes ahead anyway and updates the values, and somehow sets the status on the reader so it knows that the value is no good and it should try again. Basically, I don't want either the reader or writer to get stuck indefinitely if a developer, say, just happens to break into the debugger at the worst possible time.

This can be solved with a publishing pattern. When you want to update the values, you indicate that new values are ready by publishing their new location.

Let's say that the data that needs to be shared is a collection of four integers.

struct SHAREDDATA {
 int a, b, c, d;
};

Assume that there is a practical limit on how often the value can change; this is usually a safe assumption because you'll have some sort of external rate limiter, like "This value changes in response to a user action." (Even if there is no such limit, most solutions will simply posit one. For example, the SLIST functions simply assume that a processor won't get locked out more than 65535 times in a row.) In our case, let's say that the value will not change more than 64 times in rapid succession.

#define SHAREDDATA_MAXCONCURRENT 64

SHAREDDATA g_rgsd[SHAREDDATA_MAXCONCURRENT];
UINT g_isd; // current valid value

void GetSharedData(__out SHAREDDATA *psd)
{
 *psd = g_rgsd[g_isd];
}

Reading the data simply retrieves the most recently published value. The hard part is publishing the value.

Actually, it's not hard at all.

LONG g_isdNext = 1;

void UpdateSharedData(__in const SHAREDDATA *psd)
{
 UINT isd = (UINT)InterlockedIncrementAcquire(&g_isdNext);
 isd %= SHAREDDATA_MAXCONCURRENT;
 g_rgsd[isd] = *psd;
 InterlockedExchange(&g_isdNext, isd);
 
}

Publishing the data is a simple matter of obtaining a slot for the data, using Interlocked­Increment to get a unique location to store the data, or at least least unique until SHAREDDATA_MAXCONCURRENT - 1 intervening publications have occurred. We store our results into the memory we obtained and then publish the new index. The publication needs to be done with release semantics, but since there is no Interlocked­Exchange­Release, we just do a full barrier exchange.

Note that the update is not atomic with the read. A processor can call Get­Shared­Data, revise the values, then publish them, only to find that it overwrite a publication from another processor. If the new values are dependent on the old values (for example, if they are a running total), then you just lost an update.

Note also that if two threads try to update at the same time, it's pretty much random which set of values you get since it's last writer wins.

It so happens that in this particular case, the new values had nothing to do with the old values, so there was no problem with lost updates. And in practice, only one process updated the values at a time. (There is a master controller who updates the values, and everybody else just reads them.) Therefore, this simple method meets the requirements.

Exercise: How would you adapt this solution if the new values depended on the old values?

Comments (19)
  1. Gabe says:

    If the new values depend on the old values, you just have to check to make sure that the version that was current when you read the data is still the current version when you publish your data. If it's not, start over.

    I believe that this is the situation that InterlockedCompareExchange was designed for.

  2. Zarat says:

    Doesn't GetSharedData need some sort of memory barrier or something to ensure it doesn't keep reading cached values of g_isd? Or is this somehow different than the Erratum from the singleton constructor article?

    blogs.msdn.com/…/10150261.aspx

  3. Adam Rosenfield says:

    If the new values depended on the old values, use the try/commit/try again pattern.  However, the code needs to be modified so that the computation of the next datum occurs inside UpdateSharedData so that the data appears in the correct order in the array.  Something like this should work:

    void UpdateSharedData(SHAREDDATA (*ComputeNextDatum)())

    {

       UINT orig_isd, isd;

       do

       {

           orig_isd = (UINT)g_isdNext;  // or Raymond's InterlockedReadAcquire from yesterday

           isd = (orig_isd + 1) % SHAREDDATA_MAXCONCURRENT;

           g_rgsd[isd] = ComputeNextDatum();

       } while(InterlockedCompareExchange(&g_isdNext, isd, orig_isd) != orig_isd);

    }

    This assumes that the computation of the next datum is relatively fast compared to the frequency that multiple writers will try to call UpdateShareData.  If you have high contention, you'll waste a lot of time looping and recomputing here, in which case it's better just to use a lock and only have each call do one computation instead of potentially many.

  4. Groosalugg says:

    I like to use a seqlock with a read-only shared memory section for that kind of lock-free writer->reader publishing.

    Especially if I'm designing the writer and someone else is designing the reader – There's no way for them to mess up my process by crashing or freezing at an inopportune time.

  5. Pierre B. says:

    Shouldn't "InterlockedExchange(&g_isdNext, isd);" be "InterlockedExchange(&g_isd, isd);" ?

    [Yes. Last-minute editing strikes again. -Raymond]
  6. Gabe says:

    Alex: There are plenty of scenarios where these requirements are valid. I'll give a mostly-write case and a mostly-read case.

    Assume you have a multi-process web server (like how Apache works). You want to have a single file to log accesses and want to buffer the log file writes, but you don't want a separate log buffer for each process because that could make the log file entries wildly out of order. The solution is to have a single shared buffer that every child process writes to. The controlling process will kill a child process every Nth request or after consuming too much CPU, so the child process can get terminated at any time — and of course a process might need to be debugged. Doing either of these common tasks should not deadlock your web server.

    Another example is a web browser. You have multiple browser processes running different tabs in your browser, and they all need to share session cookies and site credentials. An easy way to do that is with shared memory. Since I kill and debug browser processes every day, I would really hate it if doing either of those everyday things locked up my browser!

  7. David Walker says:

    It seems like this issue, where any code that acquires a lock can fail or be terminated before releasing it, could happen all the time. (Presumably, when you're debugging, you don't want locks that the debugged code is holding to be ignored or forcibly released.)

    Therefore, obtaining locks is a bad idea.  

    Obviously that's not true, so I suppose all code that acquires and releases a lock just needs to be robust and well-debugged. Or am I missing something?

    [Is your lock shared across processes? Robustness requirements for cross-process locks tend to be higher than for locks within a process. -Raymond]
  8. John Muller says:

    @Gabe: for the multi-process server, wouldn't it be easy enough to tack a time code onto each log entry, so the logs can be trivially merged in-order? (unless the events occur too close together that they share a time interval)

    In the last-writer-wins scenario, couldn't some really bad stuff happen if the write are preformed in different orders? you might get half of one update, and half of another, you would have to ensure that the write always occur in the same sequence.

  9. Alex Grigoriev says:

    @Gabe:

    Please read about FILE_APPEND_DATA and how it solves a problem of multiple writers to a file, even overseparate handles.

    Also, if your processes simply share a non-overlapped handle to the log file, WriteFile operations will be serialized even with GENERIC_WRITE.

    Regarding the browser. Frigging IE8/9 solves this very wrong way. After a crash it simply loses all saved credentials. Handy, isn't it?

  10. Gabe says:

    David: Yes, obtaining locks is a bad idea. :) Of course there are times when it is unavoidable and you just have to make sure you're careful then (hold locks for the least amount of time necessary, always acquire locks in the same order, and so on). Also, most locks are not shared across processes so this isn't very common. In one situation where I was sharing semaphores across processes, terminating one of them was catastrophic. In another situation I was sharing a mutex and terminating one of the processes was fine.

    @John: Would you want to have to post-process a 10GB web server access log before you could make sense of it? I wouldn't. Also, log files generally have granularity of 1 second but can have thousands of requests per second so order really does matter. When you use a last-writer-wins algorithm, you probably want to have your writes always be atomic to avoid seeing partial writes.

    @Alex: You're suggesting calling WriteFile for each log entry. I specifically said "…buffer the log file writes…" to preclude your suggestion. Why on Earth would you want to buffer writes to the log file? For the same reason you'd want to buffer writes to any file: you're doing thousands of writes per second and you want to spend your CPU and disk resources serving up web pages and not handling bookkeeping for the log file.

  11. Alex Grigoriev says:

    @Gabe:

    Remember that the disk files are opened by default in buffered mode. And if your bottleneck is in writing the logs, you're doing it wrong. Anyway, it depends what logging guarantees you want. If you want to guarantee that the record was logged, even if the issuing process died immediately aafter that, you may not want any intermediate buffering, just a WriteFile per record. If you're OK with occasional record loss on process termination, buffer a rew records and then issue WriteFile for a number of whole records only.

    "In another situation I was sharing a mutex and terminating one of the processes was fine." If you think it was fine, you didn't really need the mutex. If you really needed the mutex, it was not really fine. In either case, you're mistaken.

  12. Andrew says:

    In your example code how does g_isd used by GetSharedData() get updated? From what I can see it will always be '1'? As Pierre said above, shouldn't the final InterlockedExchange update g_isd not g_isdNext?

  13. Alex Grigoriev says:

    Ugh, the problem description is as vague as it could be, and the approach was wrong as it could be.

    I assume the customers used a shared memory section, since they're talking about multiple processes. Of course, they should not even have considered a spinlock. Only a mutex will work reliably.

    TerminateProcess or debugger is not a real life scenario and there should not even be a requirement for robustness against that. This is not a continuable failure. In case of TerminateProcess, other processes need to take note of STATUS_WAIT_ABANDONED and bail out also. A complete restart of the package or other appropriate recovery should be performed. In case of non-mission-critical application, other components should just go along, but NOT TOUCH the shared data, as it can be inconsistent.

    They could also fashion a handmade shareable equivalent of CRITICAL_SECTION. But it would be harder (although still possible) to detect an abandoned wait.

    The problem description doesn't say what's the usage case. Is it "write seldom, read often"? Does it require consistent view in all clients? Does it allow one client to work on stale state? If not, then publishing won't work. How many clients can write (one or all), now many clients can read (one or all). These are all factors that affect the choice of a solution.

    A simple and quite robust solution can be done with a multicast of WM_COPYDATA, if the requirements can be met with its constraints.

    Obligatory rant: new help in VS2010 and latest SDK *sucks*. I want index back.

    [If your application encounters an unhandled exception, the kernel will TerminateProcess it. Are you saying that unhandled exceptions are not a real life scenario? And it looks like you're saying that the rule for WAIT_ABANDONED is that the person who gets a WAIT_ABANDONED should mark the shared data as corrupted so that nobody will use it, and the only way to get it working again is to reboot your server. Some people might want a system with a better error recovery model than "reboot the server." -Raymond]
  14. Danny says:

    I had this to implement this into a bigger real life application which was a an database administration app with multiple databases scattered all over internet, of course all of them in the same VPN. Problem was to not disrupt the databases work while upgrading the structure inside and also not to disrupt already connected applications to those applications when a new version of application was released into the wild. So my real app was actually the exercise Ray gave it, since my previous n-1 version was holding the scripts to upgrade to version n (of the software or/and the app, both could be upgraded in the same new version). My implementation was to write a watchdog which on timeout would trigger a rollback.

    And for the case of one upgrade happening very close to the next – a customer will have a older version of the app that will upgrade a fresh installed database (which is the oldest in version terms since the installer is installing version 1.0) and let's say a developer will connect to the same database with the latest version – well for this case I relied on the database atomicity of the sessions – big thanks to the developers of PostgreSQL for it.

    The entire solution is in continue update, new versions are coming each 2 weeks on regular bases and is happening for over a year now, so I think I solved your exercise Ray.

  15. Gabe says:

    Alex: On my computer it takes 20 times more CPU cycles to write a file one line (80 bytes) at a time as compared to 64k at a time. My educated guess is that the system call overhead is eating up all those cycles. That also explains why protecting the buffer with a mutex (adding one Wait and one Release system call per line logged) makes it several times slower than using a lock-free buffer (but still about 25% faster than the version that calls WriteFile for each log line).

    My application with a shared mutex was protecting an I/O device. If a process is terminated while using the device, it ceases to be using it, thus freeing up the mutex (and thus the device) for any of the still-running processes to use it. If I didn't use a mutex, multiple processes could try to use the device simultaneously, causing disastrous results. Still, if you think you're smarter than me, I'd love to know how I'm mistaken.

  16. Neil says:

    So in the case of single writer, multiple readers, how many SHAREDDATA structures should you expect to need? I was thinking maybe you only need two but maybe that's a bit optimistic.

    [It depends. How frequent are the updates to the SHAREDDATA? -Raymond]
  17. David Walker says:

    Did anyone else notice the musical echo from the title of the post?

  18. Alex Grigoriev says:

    When I was younger, so much younger than today,

    I've never needed anybody's help in any way…

  19. Anon says:

    "For example, the SLIST functions simply assume that a processor won't get locked out more than 65535 times in a row."

    Is that actually documented anywhere? I'm not finding anything on MSDN or Google.

    What happens if a processor *does* get locked out more than 65535 times in a row? Potential corruption of the list structures?

Comments are closed.