You can use a file as a synchronization object, too


A customer was looking for a synchronization object that had the following properties:

  • Can be placed in a memory-mapped file.
  • Can be used by multiple processes simultaneously. Bonus if it can even be used by different machines simultaneously.

  • Does not leak resources if the file is deleted.

It turns out there is already a synchronization object for this, and you've been staring at it the whole time: The file.

File locking is a very old feature that most people consider old and busted because it's just one of those dorky things designed for those clunky database systems that use tape drives like they have in the movies. While that may be true, it's still useful.

The idea behind file locking is that every byte of a file can be a synchronization object. The intended pattern is that a database program indicates its intention to access a section of a file by locking it, and this prevents other processes from accessing that same section of the file. This allows the database program to update the file without race conditions. When the database program is finished with that section of the file, it unlocks it.

One interesting bit of trivia about file locking is that you can lock bytes that don't even exist. It is legal to lock bytes beyond the end of the file. This is handy in the database case if you want to extend the file. You can lock the bytes you intend to add, so that nobody else can extend the file at the same time.

The usage pattern for byte-granular file locks maps very well to the customer's requirements. The synchronization object is... the file itself. And you put it in the file by simply choosing a byte to use as the lock target. (And the byte can even be imaginary.) And if you delete the file, the lock disappears with it.

Note that the byte you choose as your lock target need not be dedicated for use as a lock target. You can completely ignore the contents of the file and simply agree to use byte zero as the lock target. You just have to understand that when the byte is locked, only the owner of the lock can access it via the Read­File and Write­File family of functions. (Reading or writing a byte that is locked by somebody else will fail with ERROR_LOCK_VIOLATION. Note that access via memory-mapping is not subject to file locking, which neatly lines up with the customer's first requirement.)

To avoid the problem with locking an actual byte, you can choose imaginary bytes at ridiculously huge offsets purely for locking. Since those bytes don't exist, you won't interfere with other code that tries to read and write them. For example, you might agree to lock byte 0xFFFFFFFF`FFFFFFFF, on the assumption that the file will never become four exabytes in size.

File locking supports the reader/writer lock model: You can claim a lock for shared access (read) or for exclusive access (write).

The basic Lock­File function is a subset of the more general Lock­File­Ex function, so let's look at the general function.

To lock a portion of a file, you call Lock­File­Ex with the range you want to lock, the style of lock (shared or exclusive), and how you want failed locks to be handled. To release the lock, you pass the same range to Unlock­File­Ex. Note that ranges cannot be chopped up or recombined. If you lock bytes 0–10 and 11–19 with separate calls, then you must unlock them with separate matching calls; you can't make a single bulk call to unlock bytes 0–19, nor can you do a partial unlock of bytes 0–5.

Most of the mechanics of locking are straightforward, except for the "how you want failed locks to be handled" part. If you specify LOCKFILE_FAIL_IMMEDIATELY and the lock attempt fails, then the call simply fails with ERROR_LOCK_VIOLATION and that's the end of it. It's up to you to retry the operation if that's what you want.

On the other hand, if you do not specify LOCKFILE_FAIL_IMMEDIATELY, and the lock attempt fails, then the behavior depends on whether the handle is synchronous or asynchronous. If synchronous, then the call blocks until the lock is acquired. If asynchronous, then the call returns immediately with ERROR_IO_PENDING, and the I/O completes when the lock is acquired.

The documentation in MSDN on how lock failures are handled is a bit confusing, thanks to tortured sentence structure like "X behaves like Y if Z unless Q." Here is the behavior of lock failures in table form:

If Lock­File­Ex fails Handle type
Asynchronous Synchronous
LOCKFILE_FAIL_IMMEDIATELY specified Returns FALSE immediately.
Error code is ERROR_LOCK_VIOLATION.
LOCKFILE_FAIL_IMMEDIATELY not specified Returns FALSE immediately.
Error code is ERROR_IO_PENDING.
I/O completes when lock is acquired.
Blocks until lock is acquired, returns TRUE.

Here's a little test app that exercises all the options. Run the program with two command line options. The first is the name of the file you want to lock, and the second is a string describing what kind of lock you want. Pass zero or more of the following letters:

  • "o" to open an overlapped (asynchronous) handle; otherwise, it will be opened non-overlapped (synchronous).

  • "e" to lock exclusively; otherwise, it will be locked shared
  • "f" to fail immediately; otherwise, it will wait

For example, you would pass "ef" to open a synchronous handle and request an exclusive lock that fails immediately if it cannot be acquired. If you want all the defaults, then pass "" as the options.

#include <windows.h>
#include <stdio.h>
#include <tchar.h>

int __cdecl _tmain(int argc, TCHAR **argv)
{
 // Ensure correct number of command line arguments
 if (argc < 3) return 0;

 // Get the options
 DWORD dwFileFlags = 0;
 DWORD dwLockFlags = 0;
 for (PTSTR p = argv[2]; *p; p++) {
  if (*p == L'o') dwFileFlags |= FILE_FLAG_OVERLAPPED;
  if (*p == L'e') dwLockFlags |= LOCKFILE_EXCLUSIVE_LOCK;
  if (*p == L'f') dwLockFlags |= LOCKFILE_FAIL_IMMEDIATELY;
 }

 // Open the file
 _tprintf(TEXT("Opening the file '%s' as %s\n"), argv[1],
          (dwFileFlags & FILE_FLAG_OVERLAPPED) ?
          TEXT("asynchronous") : TEXT("synchronous"));
 HANDLE h = CreateFile(argv[1], GENERIC_READ,
                FILE_SHARE_READ | FILE_SHARE_WRITE,
                NULL, OPEN_EXISTING,
                FILE_ATTRIBUTE_NORMAL | dwFileFlags, NULL);
 if (h == INVALID_HANDLE_VALUE) {
  _tprintf(TEXT("Open failed, error = %d\n"), GetLastError());
  return 0;
 }

 // Set the starting position in the OVERLAPPED structure
 OVERLAPPED o = { 0 };
 o.Offset = 0; // we lock on byte zero

 // Say what kind of lock we want
 if (dwLockFlags & LOCKFILE_EXCLUSIVE_LOCK) {
  _tprintf(TEXT("Requesting exclusive lock\n"));
 } else {
  _tprintf(TEXT("Requesting shared lock\n"));
 }

 // Say whether we're going to wait to acquire
 if (dwLockFlags & LOCKFILE_FAIL_IMMEDIATELY) {
  _tprintf(TEXT("Requesting immediate failure\n"));
 } else if (dwFileFlags & FILE_FLAG_OVERLAPPED) {
  _tprintf(TEXT("Requesting notification on lock acquisition\n"));
  // The event that will be signaled when the lock is acquired
  // error checking deleted for expository purposes
  o.hEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
 } else {
  _tprintf(TEXT("Call will block until lock is acquired\n"));
 }

 // Okay, here we go.
 _tprintf(TEXT("Attempting lock\n"));
 BOOL fRc = LockFileEx(h, dwLockFlags, 0, 1, 0, &o);

 // If the lock failed, remember why.
 DWORD dwError = fRc ? ERROR_SUCCESS : GetLastError();
 _tprintf(TEXT("Wait %s, error code %d\n"),
          fRc ? TEXT("succeeded") : TEXT("failed"), dwError);

 if (fRc) {
  _tprintf(TEXT("Lock acquired immediately\n"));
 } else if (dwError == ERROR_IO_PENDING) {
  _tprintf(TEXT("Waiting for lock\n"));
  WaitForSingleObject(o.hEvent, INFINITE);
  fRc = TRUE; // lock has been acquired
 }

 // If we got the lock, then hold the lock until the
 // user releases it.
 if (fRc) {
  _tprintf(TEXT("Hit Enter to unlock\n"));
  getchar();
  UnlockFileEx(h, 0, 1, 0, &o);
 }

 // Clean up
 if (o.hEvent) CloseHandle(o.hEvent);
 CloseHandle(h);
 return 0;
}

When you run this program, it will try to acquire the lock in the manner requested, and if the lock is successfully acquired, it will wait for you press Enter, then it will release the lock.

You naturally need to run multiple copies of this program to see how the flags interact. (If you run only one copy, then it will always succeed.)

Exercise: What changes would you make if you wanted to wait at most 5 seconds to acquire the lock? (Hint.)

Comments (35)
  1. Simon Farnsworth says:

    IIUC, the exercise has a nasty gotcha in it. The obvious change to make is to change "WaitForSingleObject(o.hEvent, INFINITE);" to wait for no more than 5 seconds (instead of INFINITE).

    The issue with doing this naively is that, as per the linked Hint, the I/O can complete at the wrong time and obliterate reused memory. The solution is in the Hint – after "WaitForSingleObject(o.hEvent, 5000);", call "CancelIo(h); GetOverlappedResult(h, &o, TRUE);", which both cancels the requested I/O

    and ensures that the OVERLAPPED has been returned to you, so no memory trample can happen.

  2. Madge says:

    "You're soaking in it."

  3. alegr1 says:

    Does CancelIo cancel FIOCTLs? LockFile FIOCTL may or may not be cancelable.

    Also, keep in mind that the file lock scope is a handle, not a thread. If you want different threads locking against each other, open separate handles to the file.

  4. mikeb says:

    Clever. I don't think it would have ever occurred to me to use a file lock for something other than to make sure that reads/writes to portions of the file were safe from corruption.

    And alegr1's helpful note would have almost certainly bitten me if I tried this for thread synchronization.

  5. NB says:

    Good to know, I wasn't aware file locking was so powerful and flexible.

  6. alegr1 says:

    @mikeb:

    Also, duplicated handles within the same process may or may not have separate locking scope.

  7. Lev says:

    But file locking is broken on some file systems (e.g., some NASes).

  8. jader3rd says:

    Does .Net always specify LOCKFILE_FAIL_IMMEDIATELY then? Because I hate dealing with trying to open a file, only to have it run into a "File is already open" exception.

  9. alegr1 says:

    @jader3rd:

    File locking and SHARE flags are two different things.

  10. Joshua says:

    @Kevin: That's why /var/lock is cleared on boot and why signal handlers existed since the bad old days. Unlike on Windows, unlinking the lock file was guaranteed to work (yes even if open).

  11. Kevin says:

    With careful use of open() and unlink(), it is common to use the *existence* of a file as a lock, at least under Unix (whose file-level locking is basically a non-feature).  This has the blessing and curse that the lock remains locked if you crash while holding it.  It's a blessing because it forces people to clean up their inconsistent data.  It's a curse because it forces *you* to clean up your inconsistent data.

  12. amroamroamro says:

    For anyone who tries the example using MinGW compilers, I had to disable output buffering to see the printed messages (or explicitly flush after each call): stackoverflow.com/…/1716621

    [I don't see why that was necessary. All of my print statements end in n, so the buffer should have been flushed anyway (because stdout is line-buffered or unbuffered if the device is interactive). -Raymond]
  13. Kevin says:

    @Joshua

    Sure, that's great for daemons, but what about everyone else?

  14. Major says:

    In C#, How does this technique compares to using Mutex ?

    [I don't see how a Mutex solves the problem. Can you store a Mutex in a memory-mapped file? -Raymond]
  15. loreb says:

    @Raymond I don't have visual studio, but I can confirm that on mingw printf("whatevern") doesn't flush stdout (and if memory serves me right not even stderr); according to msdn.microsoft.com/…/86cebhfs(v=vs.120).aspx line buffering on win32 is just an alias for full buffering.

    I bet you could write an post about it :)

  16. @Lev: any NAS that supports SMB but doesn't do file locking properly should be put out to pasture, or better still shot immediately as a warning to others.  Not the programmer's problem.

  17. Anon says:

    SMB2+ breaks file locking. DO NOT USE FILE LOCKING on WinVista+. SMB2 does not guarantee consistency. This is why all dbase-style shared flat-file databases are broken as hell on Vista+.

  18. acq says:

    Anon, can you please elaborate re: "SMB2+ breaks file locking"?

  19. Klimax says:

    @Anon

    WTF are you talking about?

    Just for reference here is doc for SMB2 msdn.microsoft.com/…/cc246482.aspx

    To my knowledge you are first to talk about that supposed massive problem. And incidentally there was nothing about this on Subversion dev mailing list. (as it would be applicable to SVN repo files)

  20. cheong00 says:

    @Lev: For NAS that don't properly support locking, you can't prevent access from ***other machines*** accessing the file at the same time anyway.

  21. Joshua says:

    @Klimax: subversion is from the Unix world. It doesn't depend on file locking. Network file locking has always been broken.

  22. John Elliott says:

    I maintain a product whose database is multiple flat files, using file locking to enforce consistency. It did indeed encounter some odd issues on a couple of SMB2 systems (writes would succeed but the files on disk would be left untouched, leaving the database inconsistent). Of course, the users went to Microsoft, and Microsoft said SMB2 wasn't buggy so the fault had to lie in our program. About six months later the users loaded a Microsoft hotfix for a (supposedly) unrelated problem, and the file corruption suddenly stopped happening.

  23. Joshua says:

    For those of you who can't find the SMB2+ file locking problem:

    Look at the auto-reconnect behavior: Connections can be lost and restored, apparently transparent to the user, and we believe user code. We have tested this and found it to be as true as we can test for including locks coming back after server reboots. (We had to take drastic measures after a month-old stuck lock despite weekly server reboots–the reboots were specifically for killing SQL locks that were breaking the backup process [not a bug–idiot engineers kept leaving transactions open in SQL windows]).

    Considering the following scenario: Client #1 takes file lock. Connection breaks in a way immediately noticed by server but not immediately by client. Server releases lock but client doesn't know about it. Client #2 takes lock, manipulates file, and releases lock. Client #1 discovers the disconnect by timeout, reconnects, and reestablishes locks. Lock reestablishment succeeds.

    To the claim of the API got it right: MoveFile's actual semantics don't match its documented semantics: http://www.virtualbox.org/…/2350 . Once having found one case of the documentation simply being wrong, finding more should not be surprising. Also, the samba code indicates the existence of rebuild open file handles on reconnect.

  24. bzakharin says:

    So is the bonus requirement not met? Locking does not work over the network. I found out the hard way. I was working on a problem in some very old code (It talked to telex machines via modem, that old) which had the problem for years (probably from day one of the Windows port. The original code was for VMS), but very intermittently, of one received message overwriting another on disk. Turns out we were doing file locking over the network. We fixed it. I don't remember how, but obviously it was some other sort of synchronization. Less than a year later Verizon, who acquired MCI, shut down telex, and the entire product became obsolete.

  25. Joshua says:

    @Boriz Zakharin: Network file locking has always been broken. I can prove the following requirements triad has no solution: recover locks from transport (read: TCP/IP but changing protocol doesn't help) disconnect, clean up lock on client crash, and locks do not exist on server disk.

  26. bzakharin says:

    So how is "Bonus if it can even be used by different machines simultaneously" achieved then?

  27. Joshua says:

    @Boriz Zakharin: I suggest NTFS Transactional API.

  28. Medinoc says:

    The problem I see with this is that you can't lock a synchronous file handle with a timeout: You only get the choice between zero an INFINITE, or switching to an asynchronous mode.

  29. Anon says:

    @klimax

    Here's the whitepaper:

    http://www.dataaccess.com/…/opportunlockingreadcaching.html

    Here's ONE of the patches to correct part of the issue:

    support.microsoft.com/…/2028965

  30. Kevin says:

    @Joshua

    Microsoft is considering dropping NTFS transactions entirely:

    msdn.microsoft.com/…/hh802690%28v=vs.85%29.aspx

  31. Joshua says:

    OK that's bizarre. The low uptake is due to needing to support XP and 2003 for a few more years. Once again, MS doesn't look at the reason for the low uptake and thinks low uptake -> drop.

  32. Anon says:

    @Joshua

    The problem is that there's no "need" to support XP/2003. They're ancient. Obsolete. Insecure. If they were Apple products, Apple would start denying that they ever existed and deleting ("archiving") KB articles.

    There is no valid reason for anyone to not be using Win7 at this point for desktop machines, especially given how well it runs on older hardware and the massive number of stability improvements that were made over XP.

  33. Gabe says:

    Anon: The need to support XP/2003 comes from our customers using it.

    My company's biggest customer still has Server 2003 and my second-biggest has only recently finished migrating from XP to Win7 (bonus: they're currently considering migrating off IE8).

    You can feel free to tell your customers that they have no valid reason to not be using "modern" OSes, but they'll likely disagree with you.

    I am not in the position of dictating what software my customers use. If my product isn't compatible with their environment, they won't buy it, so I ensure that my product is supported by whatever environment they have (at least for customers that are willing to pay for the support).

  34. pinging @Joshua says:

    Is there a reason for file servers not to keep on-disk metadata then? (It seems like that they do anyway from what you've said earlier.)

  35. Engywuck says:

    @Gabe: the problem is: sometimes the companies themselves *can't* update their servers, at least not all of them.

    For example after we updated our DCs and File Servers to 2008R2 one production machine suddenly couldn't connect to the share where the production data was held. The reason: in 2006 the control panel with central processing of that machine was replaced — with a Win98 a control OS! (I'd have partially understood NT). But Win98 doesn't "speak" modern security, which is enabled by default in 2008R2. Nearly the same for another machine which we bought new in 2012 or so: control OS was XP ("we get support till 2017" – that's what they said. Great.).

    The "reason" in both cases: those who develop production machines develop their controls on one OS and have to recertify when changing the platform, which is really non-cheap. If you as customer are really lucky you are allowed to install updates (or they have mechanisms to reset on reboot). And no, you don't have a choice: the number of vendors of those specific machines is small (as in "one digit").

    We *really* try to only have modern OS's in our network, but… well… *perhaps* we lose the last W2k Server next year. If nothing bad happens. As we do lose the W2k Pro systems, which were needed for a LOB software written in QBasic and accessing some hardware. Maybe. If the programmer gets the departments to sign the new software off.

Comments are closed.

Skip to main content