Eric’s Complete Guide To BSTR Semantics

If you've ever done any C++ or C programming that used COM objects, you'll certainly have seen code like this:


What is this BSTR thing, and how does it differ from WCHAR* ?

Low-level languages like C or C++ allow you great freedom in deciding which patterns of bits are used to represent certain concepts.  Unicode strings are an excellent example.  The standard way to represent an n-character Unicode string in C++ is as a pointer to a 2 x (n + 1) byte buffer where the first 2 x n bytes are unsigned short integers representing the characters and the final two bytes in the buffer are zeros, terminating the string.

For notational convenience we shall take a page from Hungarian notation and call such a beast a PWSZ, short for "Pointer to Wide-character String, Zero-terminated".  As far as the C++ type system is concerned, a PWSZ is an unsigned short *.

COM uses a somewhat different approach to storing string data, an approach which is sufficiently similar to allow good interoperability between code expecting PWSZs and code providing COM strings. Unfortunately they are sufficiently different that the subtle differences can cause nasty bugs if you are not careful and cognizant of those differences.

COM code uses the BSTR to store a Unicode string, short for "Basic String". (So called because this method of storing strings was developed for OLE Automation, which was at the time motivated by the development of the Visual Basic language engine.)

From the compiler's point of view a BSTR is also an unsigned short *.  The compiler will not care if you use BSTRs where PWSZs are expected and vice-versa.  But that does not mean that you can do so without impunity!  There would not be two names for the same thing if they were not in some way different; these two things are different in a number of ways. 

In most cases a BSTR may be treated as a PWSZ.  In almost no cases may a PWSZ be treated as a BSTR.

Let me list the differences first and then discuss each point in excruciating detail.

1) A BSTR must have identical semantics for NULL and for "".  A PWSZ frequently has different semantics for those.

2) A BSTR must be allocated and freed with the SysAlloc* family of functions.  A PWSZ can be an automatic-storage buffer from the stack or allocated with malloc, new, LocalAlloc or any other memory allocator.

3) A BSTR is of fixed length.  A PWSZ may be of any length, limited only by the amount of valid memory in its buffer.

4) A BSTR always points to the first valid character in the buffer.  A PWSZ may be a pointer to the middle or end of a string buffer.

5) When allocating an n-byte BSTR you have room for n/2 wide characters.  When you allocate n bytes for a PWSZ you can store n / 2 - 1 characters -- you have to leave room for the null.

6) A BSTR may contain any Unicode data including the zero character.  A PWSZ never contains the zero character except as an end-of-string marker.  Both a BSTR and a PWSZ always have a zero character after their last valid character, but in a BSTR a valid character may be a zero character.

7) A BSTR may actually contain an odd number of bytes -- it may be used for moving binary data around.  A PWSZ is almost always an even number of bytes and used only for storing Unicode strings.

Over the years I've found and fixed many bugs where the author assumed that a PWSZ could be used as a BSTR or vice-versa and thereby violated one of these differences.  Let's dig in to those differences:

1) If you write a function which takes an argument of type BSTR then you are required to accept NULL as a valid BSTR and treat it the same as a pointer to a zero-length BSTR.  COM uses this convention, as do both Visual Basic and VBScript, so if you want to play well with others you have to obey this convention.  If a string variable in VB happens to be an empty string then VB might pass it as NULL or as a zero-length buffer -- it is entirely dependent on the internal workings of the VB program.

That's not usually the case with PWSZ-based code.  Usually NULL is intended to mean "this string value is missing", not as a synonym for an empty string. 

In COM if you have some datum which could be a valid or could be missing then you should store it in a VARIANT and represent the missing value with VT_NULL rather than interpreting a NULL string as different from an empty string.

2) BSTRs are always allocated and freed with SysAllocString, SysAllocStringLen, SysFreeString and so on.  The underlying memory is cached by the operating system and it is a serious, heap-corrupting error to call free or delete on a BSTR.  Similarly it is also an error to allocate a buffer with malloc or new and cast it to a BSTR.  Internal operating system code makes assumptions about the layout in memory of a BSTR which you should not attempt to simulate. 

PWSZs on the other hand can be allocated with any allocator or allocated off the stack. 

3) The number of characters in a BSTR is fixed.  A ten-byte BSTR contains five Unicode characters, end of story.  Even if those characters are all zeros, it still contains five characters.  A PWSZ on the other hand can contain fewer characters than its buffer allows:

WCHAR pwszBuf[101];
pwszBuf[0] = 'X';
pwszBuf[1] = ''; 

pwszBuf is a one-character string which may be lengthened to up to a 100 character string or shrunk to a zero-character string.

4)         A BSTR always points to the first valid character in the buffer.  This is not legal:

BSTR bstrName = SysAllocString(L"John Doe");
BSTR bstrLast = &bstrName[5]; // ERROR

bstrLast is not a legal BSTR.  That is perfectly legal with PWSZs though:

WCHAR * pwszName = L"John Doe";
WCHAR * pwszLast = &pwszName[5];

5) and 6) The reasons for the above restrictions make more sense when you understand how exactly a BSTR is really laid out in memory, and this also explains why allocating an n-character BSTR gives you room for n characters, not n-1 like a PWSZ allocator.

When you call SysAllocString(L"ABCDE") the operating system actually allocates sixteen bytes.  The first four bytes are a 32 bit integer representing the number of valid bytes in the string -- initialized to ten in this case.  The next ten bytes belong to the caller and are filled in with the data passed in to the allocator.  The final two bytes are filled in with zeros. You are then given a pointer to the data, not to the header.

This immediately explains a few things about BSTRs:

  • The length can be determined immediately.  SysStringLen does not have to count bytes looking for a null like wcslen does.  It just looks at the integer preceding the pointer and gives you that value back.
  • That's why it is illegal to have a BSTR which points to the middle of another BSTR.  The length field would not be before the pointer.

A BSTR can be treated as a PWSZ because there is always a trailing zero put there by the allocator.  You, the caller, do not have to worry about allocating enough space for the trailing zero.  If you need a five-character string, ask for five characters.

  • That's why a BSTR must be allocated and freed by the Sys* functions.  Those functions understand all the conventions used behind-the-scenes.

    7) Because a BSTR is of a known number of bytes there is no need for the convention that a zero terminates a string.  Therefore zero is a legal value inside a BSTR.  This means that BSTRs can contain arbitrary data, including binary images.  For this reason BSTRs are often used as a convenient way to marshal binary data around in addition to strings.  This means that BSTRs may be, in some odd situations, an odd number of bytes.  It is rare, but you should be aware of the possibility.

    Whew!  To sum up, that should explain why a BSTR may usually be treated as a PWSZ but a PWSZ may not be treated as a BSTR unless it really is one.  The only situations in which a BSTR may not be used as a PWSZ are (a) when the BSTR is NULL and (b) when the BSTR contains embedded zero characters, because the PWSZ code will think the string is shorter than it really is and (c) the BSTR does not in fact contain a string but rather arbitrary binary data.  The only situation in which a PWSZ may be treated as a BSTR are when the PWSZ actually is a BSTR, allocated with the right allocator.

    In my own C++ code I avoid misunderstandings by making extremely careful use of Hungarian Notation to keep track of what is pointing to what.  Hungarian Notation works best when it captures semantic information about the variables which is obscured by the type signature.  I use the following conventions:

    bstr --> a real BSTR
    pwsz --> a pointer to a zero-terminated wide character string buffer
    psz  --> a pointer to a zero-terminated narrow character string buffer
    ch   --> a character
    pch  --> a pointer to a wide character
    cch  --> a count of characters
    b    --> a byte
    pb   --> a pointer to a byte
    cb   --> a count of bytes

  • Comments (35)
    1. Matthew says:

      Looking at as an example, would I then be correct in assuming that this API could not (hypothetically) use a BSTR for the FileName since it needs to be able to distinguish NULL from a blank string?

    2. dave sanderman says:

      AUGH! ‘psz’ and ‘pwsz’ do NOT mean what you claim they mean! as it is you claim equivalenze between ‘psz’ and ‘sz’, and type equivalence between ‘psz’ and ‘pch’. treating ‘psz’ and ‘sz’ as the same screws up type algebra because you can’t cancel ‘p’ and ‘*’ to match types.

      i see a lot of people, and when i was at MSFT i saw a lot of internal MSFT code, that gets this wrong; people often claim that they use ‘psz’ and ‘sz’ interchangably except that an ‘sz’ is an owned buffer of some kind (e.g. char sz[32]) whereas ‘psz’ is a pointer to a buffer that you don’t own, but again, that beefs the type algebra. ‘pch’, ‘sz’, and ‘rgch’ all imply type of ‘char *’; ‘psz’ is a pointer to a char *, or ‘char **pszFoo;’.

    3. sean says:

      Hey dave, I was just on my way to forward this your way, just to tweak you on the psz/sz thing. Guess I don’t have to bother now. Glad you’re still fighting the good fight.

    4. Eric Lippert says:

      Indeed, I have seen many inconsistent uses of this at Microsoft and elsewhere. And yours makes a lot of sense. But surely what’s most important is to pick a convention and stick with it? No convention will be perfect.

    5. Eric Lippert says:

      Matthew: Indeed, that API would be a poor candidate for a BSTR, for several reasons, not the least of which is the NULL BSTR issue. Note also that it takes a pointer to TCHARs, which means that it has a different type signature depending on whether it is being used with or without UNICODE. BSTRs, by contrast, are always 16 bit characters on any Win32 operating system.

    6. Raymond Chen says:

      sz/psz is problematic because the language fights you. The distinction is important (sizeof(sz) is very different from sizeof(psz), ask any security expert), but the C/C++ language’s autodecay rule means that when you say "sz" it turns into "psz" in rvalue context. In most code that I see, psz means "pointer to array of characters, null-terminated" and sz means "array of characters, null-terminated"; i.e., both are type "char*" when used as an rvalue(. pch is also of type "char*" but does not imply null termination.)

    7. C-J Berg says:

      I’ve seen plenty of documentation for C++ programmers stating that a missing BSTR should be represented as "BSTR b = SysAllocString("")" (or equivalent), when in fact using NULL would be much more convenient and of course would not consume any resources. Here’s an example, "Visual C++ ADO Programming" from ADO 2.8’s documentation:

      "Coding a Missing Parameter — String
      When you need to code a missing String operand in Visual Basic, you merely omit the operand. You must specify the operand in Visual C++. Code a _bstr_t that has an empty string as a value.

      _bstr_t strMissing(L"");"

    8. > ‘psz’ is a pointer to a char *, or ‘char **pszFoo;’

      Dave, I don’t think I’m with you on that one. I’d call a char** a ppsz, not a psz.

      Like this:

      char szFoo[] = "foo";
      char* pszFoo = szFoo;
      char** ppszFoo = &pszFoo;

      I think this is by far the most common convention. I’d never seen a char** called a psz until just now.

    9. Johan Ericsson says:

      Originally, why was the decision made to represent a BSTR like this. Why not use a structure?

      struct BSTR
      long length;
      WCHAR* str;

      Much of the conufision over BSTRs would have been avoided if this had been done.

    10. Eric Lippert says:

      Johan, let’s call your suggestion the "transparent" ideal and the actual implementation the "opaque" ideal.

      I do not personally know why the original OLEAUT implementor chose to make BSTRs opaque — next time I see him, I’ll ask — but I can make some general observations.

      The nice thing about transparency is that you, the user of the structure, can see at a low level everything that is going on. But that’s also the not-so-nice thing about it — transparent systems lack abstraction, lack information hiding, cannot easily be extended, and so on.

      As an implementor, by keeping the structure of a BSTR hidden and forcing the user to use various functions (SysStringLen, etc) to access the information, you become free to change the implementation details in the future.

      BSTRs have lots of nice invariants — they are always null-terminated, they are always allocated and freed with the same allocator, they are always cacheable, there are consistent marshaling rules, etc. By going to a transparent system you risk the user screwing all those things up. What stops someone from leaking the old buffer and replacing it with a new one? What if the user changes the "length" field?

      And finally, your system would require two memory allocations per string, which is wasteful and error-prone.

    11. Santa Clause says:

      The problem with an opaque implementation is when it doesn’t cover all circumstances, hence the transparent "ideal" is unexpectedly required. But you could never doubt that the most clever programmers make far less mistakes than the least clever.

    12. Aarrgghh says:

      There’s a minor error in the article:

      > WCHAR * pwszBuf[101];

      That’s not a string. It’s an array of pointers to WCHAR.

      > pwszBuf[0] = ‘X’;

      > pwszBuf[1] = ‘’;

      There’s nothing wrong with assigning single-byte characters to pointers to WCHAR, of course (they’re all just numbers, after all, and character literals are 32 bits anyway these days), but if you want it to compile you’ll have to cast them, like so:

      pwszBuf[0] = (WCHAR *)’X’;

      pwszBuf[1] = (WCHAR *)’’;

      Of course, that’s not what the author was talking about at all. His INTENT was almost certainly this:

      WCHAR wszBuf[101];

      wszBuf[0] = L’X’;

      wszBuf[1] = L’’;

      …which is completely different.

      Heck, we’ve ALL checked in code that doesn’t compile. It’s hardly a mortal sin, especially when the code’s in an article that’s not being compiled at all. I just worry about the inexperienced readers who’ll look at the the non-compiling example and assume that it MUST be right, and blame themselves for not being able to make much sense out of it. Pointers are confusing enough for beginners even if the example code IS correct, as I’m sure we all remember.

      Leaving aside that one kvetch, of course, it’s a cool article.

      Oh, but there is one more thing: You have to remember that just because somebody takes a BSTR as an argument, that doesn’t necessarily mean he’s really treating it as a BSTR all the way through, inside. E.g., the COM API for WMI treats the WMI "string" type as a VT_BSTR VARIANT, but nowhere does MSDN claim that the WMI "string" type really IS a BSTR, and in fact it isn’t: It’s null terminated. A BSTR goes into the repository and a BSTR comes back out a week later, but in between times it’s not a BSTR at all and anything past the first 0 character is gone, gone, gone. So RTFM, kids.

    13. Eric Lippert says:

      Indeed, you are totally correct — thanks for the note.

      I showed that article to dozens of people before I posted it here, and proofread it many times, and still no one found that bug. I should have run it through a compiler.

      I’ve fixed that as well as a few grammar and formatting errors.

    14. Kavitha says:

      I am a COM beginner and have the same question baffling me as that posted by Matthew. I am using COM to wrap a set of Windows APIs. I need to set an LPWSTR parameter in the API to a NULL pointer from VBScript. What should the corresponding parameter type be in the wrapper COM function so it can distinguish NULL and "". BSTR i can see is not the choice!

    15. Eric Lippert says:

      What I would do is have the COM API take a variant. If it is a not-null VT_BSTR then pass the string to the underlying API. If it is a null VT_BSTR then pass an empty string to the underlying API. If it is VT_NULL, VT_EMPTY or VT_ERROR set to PARAMNOTFOUND then pass NULL to the underlying API. Otherwise, bzzt, type mismatch.

    16. Kavitha says:

      Tried and tested perfectly.Thanks!

    17. Durga says:

      Hi all,

      Can’t we free BSTR string with CoTaskMemFree?



    18. Eric Lippert says:

      No, because (a) the BSTR doesn’t point to the beginning of the allocated block, and (b) you have no guarantees that the next version of OLE Automation will lay out the BSTR the same or use the same allocation strategy. It presently uses a caching strategy built on top of IMalloc but there is no guarantee that it will continue to do so.

      Thats the difference between actual behaviour and documented behaviour. The documented behaviour is what you can rely on, and the documentation says to use SysFreeString, so please do!

    19. Here’s a story that I said

      a long time ago that I was

      going to tell you all, and then promptly…

    20. AndySmall says:

      Another difference is in leakage. How do you tell if you’ve forgotten to free a BSTR? The malloc/free and new/delete used for a PWSZ show up in the debug output when closing the app, but a hanging SysAllocString() doesn’t get reported.

    21. Shreya says:

      Great job!  A note on the use of BSTR, and BSTR * as function arguments with reference to pwsz would have made it even better.  Thanks.


    22. Vishal Singh says:

      I want to convert a char array containing embedded NULL(/0) chars to BSTR.

      eg cArr[‘1′,’2′,’NULL’,’3′,’NULL’,’4′,’5′,’NULL’].

      Is it possible to convert it to bstr without losing anything.

      So the _bstr_t variable will contain "12NULL3NULL45NULL"

    23. jparker says:

      How can you create a bstr from a hex representation.For example 06F106F206F3 for 1,2,3 in Arabic ?

    24. Ethen says:


      How do I convert BSTR to char* and vice-versa?

      How do I convert WCHAR* to char * and vice-versa?

      Also, Is there a direct way to convert between BSTR and WCHAR* ?

    25. eHaq says:

      If we have an interface function with a parameter as BSTR, than what who will be responsible for the cleanup of this BSTR. Should the function call SysFreeString() on the BSTR parameter or caller be responsible for the cleanup? COM can be called from VB, so there might not be a SysFreeString().

    26. Eric Lippert says:

      An interface is a contract; it describes the behaviour expected of the caller and the callee.  The question of "who owns freeing this memory, and how?" is part of that contract, so, decide who you want to own freeing it, and write that into the documentation of your interface.

      Typically though, you’d expect the following:

      If the BSTR is an "in" parameter, then the caller usually owns freeing it.  If the callee wants to own it, then the callee can make a copy and own the copy.

      If the BSTR is an "out" parameter then it should be null on entry, so that it does not have to be freed, and then obviously the caller owns the resulting string, since the callee is no longer around to free it.

      If the BSTR is an "in/out" parameter then the callee frees the value passed in and replaces it. The caller owns freeing the new value.

      If the BSTR is an "out ret" parameter then obviously the caller frees it.

      VB will expect these rules, and will free strings on your behalf if it is the caller and the caller owns freeing the string.

    27. Mark Christian says:


      Can I extend your last comment "VB will expect these rules, and will free strings on your behalf" to VBScript (classic ASP)?



    28. karthik says:

      Hey please tell me how do i include an argument in method call…

      When i try to do something like…

      BSTR ClassNameInstance = SysAllocString(


      It says: cant add two pointers.

      Please tell me how to do it…

      Thanks in advance

    29. karthik says:

      ProcId is an int which contains the actual pid of the process…

    30. Anonymous Coward says:

      Excellent guide. You might want to consider incorporating some of the better commens (including your own) into the main body of text though.

    31. Correct me if I'm wrong and pleaese add disadvantages & advantages says:

      Advantages of BSTR:

      – faster to find it’s length

      – faster marshalling process of strings between 2 different computers: one Win32, other one Win64 (actually no conversion needed)


      – consume double memory + 2 to store the same string represened as TCHAR on Win32

      – need to learn SysAlloc,SysFree, etc. functions and be carefull when using them

      – need to learn some 20 conversion functions and macros,  and be carefull when using them

      – pitchfalls with _bstr_t, CComBstr and BSTR when people expect to interchange them, so need to learn those too and be carefull when using them

    Comments are closed.

    Skip to main content