What’s wrong with this code, part 6 – the answers

In yesterdays post I presented a trace log writer that had a subtle bug.

As I mentioned, the problem had nothing to do with the code, but instead, the problem had to do with the directory in which the trace log file was written.

My second hint to the problem was that only symptom that the person who wrote the code saw was that the trace logger would only ever write one line to the file, and then only after the file was deleted.

And it turns out that that’s key to the problem.  Consider a folder with an ACL that allows the user to create files, but not write to them.  This isn’t as strange as you might imagine, it’s a common pattern to grant users the ability to create files in a directory but not to modify them once they’ve been created.  As a simple example, consider your review – you should be allowed to post your review to a shared folder, but once the review’s been posted, it’s a bad idea for you to be able to modify the document.  On the other hand, you do want to be able to read that file after it’s been posted.

When you have a directory that is set up in this fashion, the use can create files (they have the FILE_CREATE_FILE access right to the directory), but they can’t modify the files once they’re created (the OBJECT_INHERIT_ACE | INHERIT_ONLY ACE on the directory only grants GENERIC_READ access to the file).

The thing that’s confusing about this is that the user was allowed to write to the file when it was created, but not when they opened the file subsequently. 

The reason for this is that in NT, a user is granted full control over an object that they create, but once the object has been created, the ACLs on that object apply.  So when the LogMessage API first created the file, it was granted full access to the file.  And the user was able to write their data to the file.  Once they closed the file, however, their full access to the file reverted back to the ACL on the file, so for the second write, the call to CreateFile() failed.

It turns out that this behavior has the potential to hit a large set of applications.  There is a very common design pattern intended to protect an application’s data files from data corruption when saving a file.  This design pattern works by first creating a temporary file, writing the contents of the document to the temporary file, and if the temporary file was written successfully, it deletes the old file and renames the new file over the old file.  It then sets the security descriptor on the renamed file to match the original security descriptor on the file.  The problem is that before the rename happened, the application closed the temporary file.   As a result, the ACL on the newly renamed file is the default ACL for the directory, which may not allow the user to modify the security descriptor on the file.

So what WAS wrong with the code?  Well, the problem is in the call to CreateFile:


The 4th parameter to the CreateFile API is the SECURITY_ATTRIBUTES structure that is to be applied to the new file.  That structure contains inheritance information about the file, but it also the security descriptor to apply to the new file.  In this case, either the TRACELOG_FILE_NAME should be set to a directory that the developer knows grants the user access to the file, or the developer should specify an ACL for the newly created file that grants access to the file.

Oh, and before anyone asks why NT did such a silly thing as granting the user full control over a newly created resource, consider the save your review to the shared folder example above.  If the ACL on the folder applied, then you’d never be able to write the contents of the file in the first place.

This is a general principal of all objects in NT, in case it wasn’t obvious.  It applies to all kinds of handles, from file handles to event handles, to registry handles. I’ve debugged dozens of problems in the past related to this problem – someone creates an object, sets some things on that object, closes the object and later attempts to modify the object and the modification fails.  Just about every time I’ve ever seen it, the problem has been related to the ACL on the object.

 So Kudo’s and Mea Culpa’s:  Once again, Mike Dimmick was the first person to catch on to what I was trying to demonstrate.  He framed it in terms of a limited access service account, but the principal is the same – you can create an object but can’t re-open it for the same access after it’s inherited its ACL from the parent.

And as usual, there were unintentional bugs in the code.  The biggest one was caught by Skywing; he correctly caught the fact that the EnterCriticalSection should be protected with a try/except since it can throw (EDIT: THIS IS NOT TRUE ON WINDOWS XP AND BEYOND, SEE BELOW).  Several people (Keith Moore, Sayavanam, Dave M, caught issues with my use of StringCbVPrintfEx (not checking return codes, not explicitly using the A version of the API.  And Niclas Lindgren had a fascinating series of posts about exception safety in applications (which I’ll address in the near future in a separate post).

Normon Diamond also caught the fact that the cleanup code checked for the handle not being null, but INVALID_HANDLE_VALUE is -1. 

Edit: Pavel Lebedinsky points out in the comments below that on XP and beyond (W2K3, Longhorn) EnterCriticalSection will NOT raise exceptions.  So Skywing's point is actually incorrect.

Comments (6)

  1. Anonymous says:

    > Consider a folder with an ACL that allows

    > the user to create files, but not write

    > to them.

    Note however that users can always change permissions on any files they created, regardless of what the DACL says.

    Your review folder example will work only if you apply additional restrictions at the share level, to prevent users from changing the DACL.

    > EnterCriticalSection should be protected

    > with a try/catch since it can throw

    No it shouldn’t. On XP and later, Enter/LeaveCS never raise exceptions. On Win2K and before, they can raise, but you can’t recover from this, even if you use try/except (try/catch should only be used for C+ exceptions; in Whidbey catch(…) will not even catch SEH exceptions by default).

    The reason you can’t recover is because on Win2K the critical section could be left in a corrupted state when Enter or LeaveCriticalSection throws.

    The bottom line is that you should never wrap Enter/LeaveCriticalSection in a try/except. On XP and later, it’s not necessary, and on Win2K it doesn’t really help anyway.

    If you need critical sections to work reliably in out-of-memory conditions on Win2K your only option is to preallocate the event using InitializeCriticalSectionAndSpinCount.

  2. Anonymous says:

    A good point Pavel.

    And I meant try/except, not try/catch, I’ll update the text.

  3. Anonymous says:

    I never said it raises on Windows XP or later. In those cases it uses a system wide KeyedEvent (new kernel object) to avoid having to allocate memory for each critical section having it’s own wait semaphore.

    Note that MSDN *RECOMMENDS* that you use SEH to catch EnterCriticalSection exceptions ( http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/entercriticalsection.asp ).

    How do you arrive at the conclusion that the critical section will be put into an invalid state if an exception is thrown? RtlLeaveCriticalSection throws an exception if NtSetEvent fails, which can only happen if the structure itself is damaged or somebody messed with the event handle (which are obviously nonrecoverable, but unrelated to running out of memory). I didn’t write the critsec code, but I don’t *think* the lockcount being incremented an extra time will cause any fatal problems — perhaps a lock cmpxchg with 0 when it’s not really necessary, but that oughtn’t break anything.

  4. Anonymous says:

    > How do you arrive at the conclusion that

    > the critical section will be put into an

    > invalid state if an exception is thrown?

    Here are some quotes from Neill Clift (he is the guy who fixed critical sections for XP):

    "It was really impossible to handle exceptions generated by EnterCriticalSection and LeaveCriticalSection because they left the CS in a state you couldn’t recover from."


    "I considered it impossible to code to Enter/Leave raising. To make matters worse the internal structure of the lock was damaged by the raise such that it might be unusable after the event."


    The fact that LeaveCriticalSection can also raise in low memory situations (on Win2K) seems unintuitive at first but it’s true – there’s some obscure scenario where allocation of the event has to happen in LeaveCriticalSection instead of EnterCriticalSection.

  5. Anonymous says:

    Ah, okay. In that case, somebody should update the MSDN documentation to not recommend something which doesn’t work. 🙂

  6. Anonymous says:

    Updating Application Data

Skip to main content