Playing with synchroninzation barriers


A synchronization barrier is a synchronization object that works like this:

  • A synchronization barrier knows how many threads it is managing.
  • Each thread that calls Enter­Synchronization­Barrier blocks.

  • When the last thread enters the synchronization barrier, all threads are released.

  • Once a thread exits the synchronization barrier, it can re-enter it, at which point it blocks again, and the cycle repeats.

The idea here is that you have a multi-step process, and each thread must complete a step before any thread can go on to the next step. For example, you might have a sequence of steps like this:

  • Everybody enters the room and sits down.
  • Wait for everybody to be seated.
  • Start the movie.
  • Everybody watches the movie.
  • After the movie ends, everybody leaves the room and stands outside.

  • Wait for everybody to leave the room.
  • One person locks the door.
  • Everybody says good-bye and goes home.

The synchronization barrier takes care of the wait for everybody to... part.

It is not uncommon that an action needs to be taken after all threads have cleared the barrier, and the action needs to be performed exactly once. In the example above, one such action is "Start the movie." To support this pattern, the Enter­Synchronization­Barrier returns TRUE to exactly one thread; all the other threads get FALSE.

Here's a Little Program that demonstrates the synchronization barrier pattern. Each thread is person who wants to watch the movie.

#include <windows.h>
#include <stdio.h>
#include <stdlib.h>

#define WATCHERS 4

SYNCHRONIZATION_BARRIER barrier;
HANDLE movieFinished;

// Watchers gonna watch.
DWORD CALLBACK MovieWatcherThread(void* p)
{
  int id = PtrToInt(p);

  // Build a string that we use to prefix our messages.
  char tag[WATCHERS + 1];
  for (int i = 0; i < WATCHERS; i++) {
    tag[i] = (i == id) ? (char)('1' + i) : ' ';
  }
  tag[WATCHERS] = 0;

  printf("%s Sitting down\n", tag);

  if (EnterSynchronizationBarrier(&barrier, 0)) {
    // We are the one who should start the movie.
    printf("%s Starting the movie\n", tag);

    // For demonstration purposes, the movie is only one second long.
    LARGE_INTEGER dueTime;
    dueTime.QuadPart = -1000000LL;
    SetWaitableTimer(movieFinished, &dueTime, 0,
                     nullptr, nullptr, FALSE);
  }

  // Watch the movie until it ends.
  printf("%s Enjoying the movie\n", tag);
  WaitForSingleObject(movieFinished, INFINITE);

  // Now leave the room.
  printf("%s Leaving the room\n", tag);

  if (EnterSynchronizationBarrier(&barrier, 0)) {
    // We are the one who should lock the door.
    printf("%s Locking the door\n", tag);
  }

  printf("%s Saying good-bye and going home\n", tag);
  return 0;
}

int __cdecl main(int, char**)
{
  movieFinished = CreateWaitableTimer(nullptr, TRUE, nullptr);
  InitializeSynchronizationBarrier(&barrier, WATCHERS, -1);

  HANDLE threads[WATCHERS];
  for (int i = 0; i < WATCHERS; i++) {
    DWORD threadId;
    threads[i] = CreateThread(nullptr, 0, MovieWatcherThread,
                              IntToPtr(i), 0, &threadId);
  }

  // Wait for the demonstration to complete.
  WaitForMultipleObjects(WATCHERS, threads, TRUE, INFINITE);

  CloseHandle(movieFinished);
  DeleteSynchronizationBarrier(&barrier);
  return 0;
}

Each thread represents a person who wants to watch the movie. We sit down, and then wait for everybody else to sit down by entering the synchronization barrier. The call to Enter­Synchronization­Barrier returns when everybody has sat down.

The watcher whose call to Enter­Synchronization­Barrier returned TRUE takes responsibility for starting the movie.

And then everybody watches the movie, waiting until the movie ends.

Once that's done, each person leaves the room, and then enters another synchronization barrier to wait until everybody is outside.

The synchronization barrier will nominate one person to lock the door.

Everybody says good-bye and goes home.

There is one tricky thing about synchronization barriers: Knowing when it is safe to reenter a synchronization barrier after exiting it. If the synchronization barrier is always used by the same pool of threads (like we did here), then there's no problem. The thread clearly has exited the synchronization barrier if it manages to run code and try to enter it again.

The tricky part is if the collection of threads participating in the barrier changes over time. The rule is that when a thread exits the barrier, then it releases a "unit of synchronization" that another thread can consume by entering the barrier. Usually, the unit is consumed by the same thread that exits the barrier, but that thread could instead hand the unit to another thread and let that other thread take over the job. It is important that there be a causal link between the exiting thread and the future entering thread, so that the future entering thread knows that the previous exiting thread has fully exited.

Suppose there are three steps: Step 1 is handled by threads A and B, and Steps 2 and 3 are handled by threads B and C. You don't want to do this:

Thread A Thread B Thread C
Work on step 1 Work on step 1 Idle
Enter­Synchronization­Barrier Enter­Synchronization­Barrier
Signal thread C to start
Work on step 2 Work on step 2
Enter­Synchronization­Barrier Enter­Synchronization­Barrier
Work on step 3 Work on step 3

The problem here is that thread C may call Enter­Synchronization­Barrier before thread A has exited it. There is no causal link between thread A exiting the barrier and thread C entering it, which creates the race condition.

In this case, you can solve the problem by having thread A be the one to signal thread C to start working on step 2. That way the "unit of synchronization" is positively handed from thread A to thread C.

Comments (24)
  1. Medinoc says:

    >"In this case, you can solve the problem by having thread A be the one to signal thread C to start working on step 2. That way the "unit of synchronization" is positively handed from thread A to thread C."

    I have trouble seeing how this doesn't cause a race condition between threads B and C... Isn't there a risk that C may enter the barrier before B is done leaving it (after entering it the first time)?

    [This happens even in the movie-watching scenario: The first person to leave the room may try to enter the synchronization barrier before the last person has left it, and that's fine and expected. -Raymond]
  2. Brian_EE says:

    According to MSDN, these functions are new for Windows 8, i.e. won't work on Win7. I would make the suggestion that articles point out when the topic of interest is only available on some of the current MS supported OS versions.

  3. pinwing says:

    There is a typo in title: synchroninzation -> synchronization

  4. EduardoS says:

    Let's see if I understand the "summary", all threads are released, but that doesn't means every thread have "exited" the barrier? Otherwise, if thread B exits then thread A would also have to exit.

  5. Adam Rosenfield says:

    I'm not sure I follow how the race condition works.  Is the problem that, if we get unlucky with the OS thread scheduler, that thread A could still have yet to exit Enter­Synchronization­Barrier by the time that thread C enters Enter­Synchronization­Barrier, in which case the barrier will have the requisite 2 threads waiting on it and let thread C continue before A has reached it?  That seems broken to me — I would have thought that even if the OS hasn't yet scheduled thread A, it wouldn't be counted against the target of 2 threads waiting for the barrier. And if that is the case, why wouldn't the same race happen in the case when the same threads are reused with the barrier?

    (Aside: a synchronization barrier is also known as a rendezvous in some systems)

    [The synchronization barrier is a user-mode concept, like a critical section. The kernel scheduler doesn't know anything about synchronization barriers. It's the same race condition you have between somebody leaving a critical section and another thread deleting it. You can't delete the critical section until the code that is exiting the critical section has fully exited. (There is no race when the same thread re-enters the barrier because it has definitely exited the barrier before re-entering.) -Raymond]
  6. PeteC says:

    @Adam, I agree, compared to pthread barriers this seems like a poor design and easy to get wrong. Once the barrier has reached the required thread count, the threads should all be released simultaneously! Regardless of how long it takes any thread to be scheduled again, the barrier should be available for immediate re-use with thread count 0. Having a state which is "released but not exited" is asking for trouble.

    [Synchronization barriers are user-mode concepts. The scheduler releases the thread, but user-mode code needs to run in order to do the necessary bookkeeping. The scheduler can't do the bookkeeping because the scheduler doesn't have any concept of a synchronization barrier. -Raymond]
  7. Adam Rosenfield says:

    I still don't follow.  Consider these two execution traces:

    Trace 1 (where thread B signals thread C):

    1) Threads A & B call EnterSynchronizationBarrier

    2) Thread B wakes up and signals thread C

    3) Thread C wakes up and calls EnterSynchronizationBarrier.  There are now 2 threads inside the barrier (A & C).

    4) Thread C exits EnterSynchronizationBarrier and proceeds past the barrier, but thread B has not yet reached it.  Thread A is still waking up before the exit of EnterSynchronizationBarrier and has not yet been scheduled by the OS. [*]

    Trace 2 (where threads A & B synchronize twice, as in the movie analogy):

    1) Threads A & B call EnterSynchronizationBarrier

    2) Thread B wakes up, performs work, and calls EnterSynchronizationBarrier again.  There are now 2 threads inside the barrier again.

    3) Thread B wakes up again and proceeds past the barrier, but thread A has not yet reached it.  It is still waking up before the exist of the first call to EnterSynchronizationBarrier and has not yet been scheduled by the OS. [*]

    In both of the cases marked [*], a thread exits EnterSynchronizationBarrier (B) and a thread subsequently enters EnterSynchronizationBarrier (C in trace 1, B in trace 2) before the first thread (A) has exited the first call to EnterSynchronizationBarrier.  If this causes EnterSynchronizationBarrier to say "oh hey, I've got all of the waiting threads I need, time to let you guys all go!", then there's the same race condition in both cases.  And if those traces *don't* cause EnterSynchronizationBarrier to say that (say, because it's smart enough not to count thread A in its list of waiting threads), then I don't understand what the race is.

    What am I misunderstanding?

    [In trace 2's step 2, there are two threads inside the barrier, but one is entering and one is exiting. Only entering threads count toward deciding whether the barrier has enough threads waiting. (Because the exiting threads aren't waiting.) I think I'll write a follow-up article on this, since it seems to be confusing. -Raymond]
  8. Darran Rowe says:

    @Adam Rosenfield:

    I think the problem is that what happens when the synchronization barrier releases all of the threads is implied and assumed to be the most logical thing to do here.

    When I first read this, I didn't see why it would do anything but just clean up after the operation and exit, and it would only wait on a handle once for each call to EnterSynchronizationBarrier. After getting to the end, I also saw immediately that the race condition talked about was on the structure itself, because if things turn out that way, we could end up in a situation where you are trying to add a third thread to a barrier that only handles two threads. In the case given, C was meant to replace A in the barrier, but because B signalled C to start working again, then there was no connection to whether A had actually exited the barrier.

    The question that I asked myself then is, do I have some kind of weird affinity with the MSDN documentation, or are other programmers jaded enough to expect weird things.

    Well, either way, maybe a bit more information in the Synchronization Barrier section of the MSDN, or even a bit more in the remarks section of EnterSynchronizationBarrier would help alleviate this in future.

  9. __klg says:

    Raymond you have misspelled 'synchronization' in your title.

  10. S says:

    @Darran Rowe "I also saw immediately that the race condition talked about was on the structure itself":

    Why? Somehow, the implementation manages to return TRUE only to exactly one of the threads. To do this, I expect that one of the threads (the last one calling EnterSynchronizationBarrier) will continue to run, will therefore be able to reset the structure without interfering with the other threads, will then wakeup all the other threads by PulseEvent or similar, and will return TRUE in the end.

    The expectation depends on the mental model of the implementation of this feature.

  11. PeteC says:

    > The problem here is that thread C may call Enter­Synchronization­Barrier before thread A has exited it.

    Well now I don't see the problem at all. In your reply above (Raymond) you say "Only entering threads count toward deciding whether the barrier has enough threads waiting". So thread A is exiting and doesn't count, so where is the race? What is the problem with C calling Enter­Synchronization­Barrier before A has exited, if A is not counting towards the total?

    Another post on this topic would be useful.

  12. skSdnW says:

    If all threads are released at the same time then there is a race to be the first to printf and some threads might be enjoying the movie before the last thread has started playing the movie?

  13. Adam Rosenfield says:

    PeteC, my thoughts exactly.  I don't see where the race is now.  Agreed that another post on this topic would be very helpful.

  14. sense says:

    RealWord™ analogies were real good and very fun, but is this type of synchronization really useful in programming? I'd love to hear practical applications for it.

  15. Joe says:

    Great, one more new thing that I now have to use so my team lead has to figure out yet another new thing against his will.

  16. 640k says:

    @Brian_EE: According to MSDN, these functions are new for Windows 8, i.e. won't work on Win7.

    How surprising. Ray is working part time in the marketing department again.

  17. alegr1 says:

    When a (batch of) threads exits the barrier, the API needs to return the sequence # of the batch. This will take care of ordering. In any case, if you have more threads than the batch size, the latter batch threads may even get scheduled earlier than former. There cannot be any ordering.

    Since the API doesn't support sequence #, it needs to be generated by InterlockedIncrement and then modulo N.

  18. Darran Rowe says:

    @S:

    Simply put, I'm psychic an amazing.

    But being serious here, you have to remember a few things about this post. First, the problematic area is what happens if you reuse a structure before it has been fully cleaned up. This should be pure undefined behaviour regardless. While you are saying that the thread that unblocks the barrier should do the cleanup, is that really true, or what happens? This is implementation defined hence why it is undefined behaviour.

    The problem that people seem to be having stemmed from that last example where Raymond talked about that three thread setup. Again, the cause of that problem was that there is a race in the usage of the structure between when the barrier is cleaned up and when it is used again. That example was about how thread B could signal thread C to work, and then C could enter the barrier all before A could leave the barrier. The race here was how C was unaware that A even existed. This meant that C could happily go into the barrier before thread A, the thread that C was to replace, actually finished its work.

    There is of course lots to say about this, the barrier could protect itself against this kind of thing easily, but that isn't the point. The point is still that the behaviour is undefined in this situation and you really shouldn't press your luck.

  19. Magnus says:

    To: sense - 24 Nov 2015 9:13 AM

    I once had to test some custom-written CAS (compare and swap) operations.  I had ported IBM's IBM Open Classes framework from their VisualAge C++ compiler to MSVC, and had to re-implement the CAS operations because they were written in assembler and they hard-coded VisualAge's ABI and calling conventions.

    To test them I had to get multiple threads to race each other along an array, CASing the same elements at the same time.  This would only work if I could get all threads started, and then all waiting at the starting line, before pulling the trigger.

    I had to implement my own synchronisation barrier (I didn't actually know there was a name for such a thing, thanks Raymond!) using C++11 STL's new (at the time) synchronisation primitives, which was quite fiddly.  But I knew what I wanted to do, this type of thing was a bit more common in BeOS (RIP) using BeOS-style semaphores.

    So not quite real-world production code as I used it for testing, but still something I had to do for work.

  20. S says:

    "the problematic area is what happens if you reuse a structure before it has been fully cleaned up ...  there is a race in the usage of the structure between when the barrier is cleaned up and when it is used again"

    For example, the mechanism could roughly work like this:

    For N threads, InitializeSynchronizationBarrier() creates N auto-reset events and stores the handles inside the structure. Also, a counter is set to N.

    For each thread, EnterSynchronizationBarrier() enters a critical section to (a) decrement this counter (=> result value is C) and (b) if C > 0, an unused event handle from this structure is find and marked as in-use. Then the critical section is leaved.

    If C is non-null, the thread calls WaitForSingleObject() on the event handle. After wakeup, it again acquires the critical section to mark the event handle in the structure as unused, then returns FALSE.

    If C is null, the thread resets the counter to N (stored in another field, also to allow range-checking the C value), then iterates through all the event handles marked as in-use to call SetEvent() on each, then returns TRUE.

    => There are always enough resources in the structure to allow each of the threads (even the true-returning one) to re-enter EnterSynchronizationBarrier as fast as possible, even when none of the other threads got any CPU time in-between.

    => It does not matter, if one thread is replaced by another thread.

    => When the barrier is unblocked, it starts over from scratch (as expected, at least by me).

    [What about if C is non-null, but there are no available event handles? (Because the thread that is using the event handle hasn't been scheduled yet.) -Raymond]
  21. Darran Rowe says:

    @S:

    As Raymond said, "What about if C is non-null, but there are no available event handles? (Because the thread that is using the event handle hasn't been scheduled yet.)"

    I think the thing that you are still missing is that the issue isn't about N threads entering a synchronization barrier large enough to store N threads. But the fact that if the set of threads working on a problem changes over time, we have >N threads entering a barrier large enough to store N threads.

    To use concrete numbers here. Lets say a task is split between two threads, threads A and B. At some point in the future, thread C is going to replace thread A.

    So we create a barrier with room to store two threads. Thread A gets the processor and quickly finishes its work, enters the barrier and waits. B takes a bit more time to finish its work, because of this A goes to sleep. B finishes its work, enters the barrier and then the barrier is released. B continues working, cleaning up after itself, signals C to start working, and then goes about doing the next stage of its work and finally re-enters the barrier, this time having to wait.

    So right now, there are already two threads in the barrier, A, which is taking a long time to wake up because it isn't good in the mornings, and B still wide awake.

    At this time, C is happily working away at the task that it is meant to do and then enters the barrier. Yes, this is a third thread entering a barrier only large enough to store two threads. ???!!! Blorf?

    This is the essence of the data race on the barrier, and it can happen. If A has had its priority adjusted so it was lower than B and C, then it could take some time for the thread to be rescheduled, especially if the system is busy. It is also a problem that Humans aren't really good at figuring out because it is time based.

    This is why I normally err on the side of caution and use the function calls as when it is safe to reuse things. Until all calls to EnterSynchronizationBarrier has exited after the barrier has been released, I class it as still being in use and would try not to use it again.

    [The scenario you describe is illegal. The number of threads a barrier waits for must remain fixed for the lifetime of the barrier. If you have a workflow where the number of threads varies, you need multiple barriers. In the above example, you would have a two-thread barrier for step 1 and a three-thread barrier for step 2. -Raymond]
  22. S says:

    @Darran Rowe: Raymond wrote "The tricky part is if the collection of threads participating in the barrier changes over time." and you described it a problem with the management of the barrier structure, as if the implementation of EnterSynchronizationBarrier expects the exit of ALL threads from EnterSynchronizationBarrier before it can be reused.

    To counter this, I presented a way to implement EnterSynchronizationBarrier that does not have any book-keeping problem, regardless of timing/thread scheduling, AS LONG AS the application respects the contract of InitializeSynchronizationBarrier / EnterSynchronizationBarrier: At any time, the number of threads calling EnterSynchronizationBarrier must not exceed the number defined by InitializeSynchronizationBarrier. You cannot have a third thread calling EnterSynchronizationBarrier if you have called InitializeSynchronizationBarrier with 2, and 2 threads are still within EnterSynchronizationBarrier.

  23. Darran Rowe says:

    @Raymond:

    Of course it is illegal. But no, I didn't need a three thread barrier for step 2 because the entire point was to show a situation where a thread could hang around too long in a barrier after it was meant to leave it. I thought I made it clear enough that thread A was still in the barrier not because it called EnterSynchronizationBarrier again, but it still hadn't returned from its only call to it.

    It was meant to be a bit of a live example of the situation described in your own post which you started off with "Suppose there are three steps: Step 1 is handled by threads A and B, and Steps 2 and 3 are handled by threads B and C. You don't want to do this:" without the third step. Obviously I failed badly.

    @S:

    "you described it a problem with the management of the barrier structure, as if the implementation of EnterSynchronizationBarrier expects the exit of ALL threads from EnterSynchronizationBarrier before it can be reused."

    Maybe so, but it seems I failed a second time. Go me, wonderful post that was.

    There was supposed to be a bit of a tone shift between my first reply to you, which was also working on the fact that the documentation, as of the time of posting, doesn't actually document the reuse of a structure. In these cases, the behaviour that you would then be using would be undocumented which is undefined.

    The second reply was then a failed attempt to give a bit of an example as to how even your design has flaws if you mess up with the whole change of thread situation.

    Note to self, stop trying to explain things to people, you suck at it...

  24. Adam Rosenfield says:

    @Darren Rowe: I think your example is the same as my trace 1 (correct me if I'm wrong), in which case Raymond's reply "Only entering threads count toward deciding whether the barrier has enough threads waiting. (Because the exiting threads aren't waiting.)" still applies.  Your thread C would be the second thread, not the third, since thread A, who's not good in the mornings, doesn't count.

Comments are closed.

Skip to main content