Passing Strings to Unmanaged Code

I've just come across a nasty bug in some sample code (from us, I'm ashamed to say), that highlights the pitfalls of passing string buffers between managed and unmanaged code.

To go back a step or two, I've been trying to create a small application to pull metadata out of Windows Media files so that I can catalogue my music collection. (Incidentallly, there are several supported ways to achieve this, including the Windows Media Player SDK and the Windows Media Format SDK.) I'd come across this little function that iterated through all the metadata attributes in a file and dumped them to the console. But for some reason, the function only seemed to be printing the attribute names and not the associated values. The statement looked something like this:

    Console.WriteLine("* {0, 3}  {1, 25} {2, 3}  {3, 3}  {4, 7}  {5}", 
      wIndex, pwszName, wStream, wLangID, pwszType, pwszValue);

According to the debugger, I was seeing the contents of wIndex and pwszName, but none of the other parameters. Stranger still, when I preceded the Console.WriteLine call with a similar call to MessageBox.Show, the function printed all the parameters. Needless to say, when you get into the kind of debugging situation where you're seeing truly unexpected results, you often disappear down a blind alley trying to solve a problem that doesn't exist. In my case, I started testing the hypothesis that it was a timing issue that the message box display eradicated; I wasted several hours experimenting with wait loops and searching through the documentation for references to file status that with hindsight couldn't have fixed the problem.

Suddenly it came to me in a flash: the debugger was showing the value of pwszName as "Duration\0". Of course! There was a null-termination character at the end of the string that shouldn't have been there. It wasn't that the call to Console.WriteLine didn't contain the right parameters - it was simply seeing the \0 and terminating the string at that point. MessageBox.Show obviously deals with this differently.

So how had pwszName got created like this? Looking back at the sample code that generated the values, I saw something like the following:

    string pwszName = null;
   ushort wNameLen = 0;
   HeaderInfo3.GetAttributeByIndex( wAttribIndex,
                                    ref wStreamNum,
                                    pwszName,
                                    ref wNameLen,
                                    out wAttribType,
                                    pbAttribValue,
                                    ref wAttribValueLen );
   pwszName = new String( (char)0, wAttribNameLen );
   HeaderInfo3.GetAttributeByIndex( wAttribIndex,
                                    ref wStreamNum,
                                    pwszName,
                                    ref wNameLen,
                                    out wAttribType,
                                    pbAttribValue,
                                    ref wAttribValueLen );

It's pretty clear from this piece of code what's wrong: the creator (presumably a C++ programmer judging by the code style) has called the function once to determine the length of the retrieved string and then called it a second time to fill a pre-populated string. They forgot to trim the final null value(s), with a statement such as the following:

    pwszName = pwszName.Substring(0, wNameLen);

Even this is not a great way of handling string buffers. A far better approach would have been to have used the System.Text.StringBuilder class - a mutable string type that can be passed wherever a string is required by an API function. Rather than trimming the returned string, I rewrote the API declaration to use a StringBuilder rather than a fixed-length string and changed the sample code accordingly:

    StringBuilder pwszName = null;
   ushort wNameLen = 0;
   HeaderInfo3.GetAttributeByIndex( wAttribIndex,
                                    ref wStreamNum,
                                    pwszName,
                                    ref wNameLen,
                                    out wAttribType,
                                    pbAttribValue,
                                    ref wAttribValueLen );
   pwszName = new StringBuilder(wNameLen);
   HeaderInfo3.GetAttributeByIndex( wAttribIndex,
                                    ref wStreamNum,
                                    pwszName,
                                    ref wNameLen,
                                    out wAttribType,
                                    pbAttribValue,
                                    ref wAttribValueLen );

The moral of the story: whenever you need to pass a string buffer to a Windows API call, use StringBuilder. (Of course, string is just fine if the unmanaged function doesn't modify its contents.) And if you're wondering why a string is being prematurely truncated, make sure you check for rogue null-termination characters!