What's wrong with this code, part 15 - the answers

Well, that was certainly interesting.  My last "what's wrong with this code" post had to do with a real-world bug in some of the notification infrastructure that's used for audio.  I took some sample code done while developing the prototype, and stripped out anything that would have described the actual infrastructure (NDA issues, sorry :)) to make the example.  As a result, except for typos (like one case where I missed some old code, and an incorrect typo), the code is exactly like I saw it.

As was mentioned in the comments , the actual bug was here:

        hr = HRESULT_FROM_WIN32(TraceEvent(_TraceHandle, (PEVENT_TRACE_HEADER)&notificationHeader._TraceHeader)); 

The problem is that the HRESULT_FROM_WIN32 macro is defined as:

#define HRESULT_FROM_WIN32(x) ((HRESULT)(x) <= 0 ? ((HRESULT)(x)) : \                    ((HRESULT) (((x) & 0x0000FFFF) | (FACILITY_WIN32 << 16) | 0x80000000)))  

Unfortunately, that macro evaluates its argument twice - once when determining if the value is less than 0, the second time when forming the actual error code.

For some APIs without side effects (like GetLastError()), this isn't a big deal, but for APIs that DO have side effects (basically any API that returns a Win32 error code), it can be catastrophic.  The problem we had was relatively minor (we sent doubled notifications), but this pattern can easily result in memory and handle leaks.  Mike Dimmick pointed out that for Vista, this issue has been addressed, there is now an optional in-line version of HRESULT_FROM_WIN32 that doesn't have this problem, unfortunately, this code didn't use it :(

 

So Kudos and mea culpas:

First the mea culpas:  There were two typos in the original example - I left in some old code, and didn't get the type of one of my casts correct.

Kudos: Marvin was the first person who caught that HRESULT_FROM_WIN32 was a macro.  Close on his heals was K.T., Thales and Harold.

Now for other things that came up in the comments thread.

A number of people had concerns about the use of the "notificationBuffer = (PNOTIFICATION_BLOCK)(notificationHeader+1)" construct.  The format of the buffer in question has a common structure (a notification_header) at the head, and  a variable structure (the NOTIFICATION_BLOCK) following.  the "notificationHeader+1" returns an appropriately aligned structure located immediately after the NOTIFICATION_HEADER into the buffer.  Given that the block was allocated using sizeof(NOTIFICATION_HEADER), they correctly noticed that if NOTIFICATION_HEADER was misaligned, this could cause significant issues.  The +1 increments the notificationHeader by the number of bytes in a NOTIFICATION_HEADER, Tim Smith called out the line in the C++ standard where it shows that behavior is required.

A number of other people were confused about the NotificationBlock->BlockSize paradigm.  The idea is that the caller provides a self-describing buffer (NotificationBlock), and this routine packs it into the resulting block.  The NOTIFICATION_BLOCK provided by the caller had to be at least NOTIFICATION_BLOCK bytes long.   There IS an error here, because the caller doesn't provide the length of the block separately, I can't validate that the NotificationBlock->BlockSize is valid - I'm relying on the C++ compilers type safety to handle this - it may be wrong, but...

Once again, throwing new raised its ugly head, as did several people who wanted me to use exception handling as an error propagation mechanism, as did probing the NotificationBlock parameter for validity.  I've touched on this before - RAII has its own set of issues here, and I always use the non throwing new.  Oh, and someone was upset at the presence of a "goto" in the code.  Would it be better if I used a "do {} while (FALSE);" and then used "break" instead of the goto?  There's no semantic difference between the two.

There was also a potential issue of arithmetic overflow or underflow - since this function is internal-only, I'm not particularly worried about arithmetic overflow issues - if this was a public API, I would, but our team controls all the users of this function, so...