Why do structures get tag names even if there is a typedef?


As we noted last time, structure tags are different from the typedef name as a historical artifact of earlier versions of the C language. But what about just leaving out the name entirely?

typedef struct {
 ...
} XYZ;

One problem with this approach is that it becomes impossible to make a forward reference to this structure because it has no name. For example, if you wanted to write a prototype for a function that took one of these structures, and you could not be sure that the header file defining the XYZ type definition has already been included, you can still refer to the structure by its tag name.

// in header file A
typedef struct tagXYZ {
 ...
} XYZ;

// in header file B
BOOL InitializeFromXYZ(const struct tagXYZ *pxyz);

The two header files can be included in either order because header file B uses a forward reference to the XYZ structure. Naturally, you would hope that people would include header file A before header file B, but there can be cases where it is not practical. (For example, header file A may contain definitions that conflict with something else that the program needs, or header file A may change its behavior based on what has already been #define'd, and you don't want to include it before the application has a chance to set up those #defines.)

But a more important reason to avoid anonymous types is that it creates problems for MIDL.

Okay, it doesn't actually create problems for MIDL. MIDL handles it just fine, but the way MIDL handles it creates problems for you, for when you create an anonymous type in MIDL, such as an anonymous structure above, or an anonymous enumeration like this:

typedef enum { ... } XYZ;

MIDL auto-generates a name for you. For example, the above enumeration might end up in the generated header file as

typedef enum __MIDL___MIDL_itf_scratch_0000_0001
{
    ...
} XYZ;

The kicker is that the auto-generated name changes if you change the IDL file. And since typedefs are just shorthand for the underlying type (rather than a type in and of themselves), the name saved in the PDB is the unwieldy __MIDL___MIDL_itf_scratch_0000_0001. Try typing that into the debugger, yuck.

Furthermore, having the name change from build to build means that you have to make sure code libraries are all built from exactly the same header file versions, even if the changes are ostensibly compatible. For example, suppose you compile a library with a particular version of the header file, and then you add a structure to the MIDL file which has no effect on the functions and structures that the library used. But still, since you changed the MIDL file, this changes the auto-generated symbol names. Now you compile a program with the new header file and link against the library. Result: A whole bunch of errors, because the library, say, exports a function that expects its first parameter to be a __MIDL___MIDL_itf_scratch_0000_0001 (because the library was built from the older MIDL-generated header file), but your program imports a function that expects its first parameter to be a __MIDL___MIDL_itf_scratch_0001_0002 (because you compiled with the newer MIDL-generated header file).

What's more, when you update the header file, your source control system will recognize hundreds of changes, since the MIDL compiler generated a whole different set of names which no longer match the names from the previous version of the header file, even though you didn't change the structure! This isn't fatal, but it makes digging through source code history more of an ordeal since the "real changes" are buried amidst hundreds of lines of meaningless changes.

Now, this particular rule of thumb is not universally adhered-to in Windows header files, in large part, I believe, simple because people aren't aware of the potential for mischief. But maybe now that I wrote them up, people might start paying closer attention.

Comments (17)
  1. And is there a reason not to use this?

    struct XYZ {

     …

    };

  2. And is there a reason not to use this?

    struct XYZ {

     …

    };

  3. And is there a reason not to use this?

    struct XYZ {

     …

    };

  4. Skywing says:

    It is also required for the debugger.  The debugger requires the use of the UDT tag (e.g. struct tag) when fetching typeinformation, and not the typedef.

    (An unpredictable guid-like struct tag is generated for anonymous structs, as I recall.  You can’t reference a type via a typedef, only the struct tag, using the .pdb codeview format.)

    • S
  5. f0dder says:

    Slijkerman: C vs. C++. With the "typedef struct {…} MYTYPE;", you can do "MYTYPE myvar;". With "struct MYTYPE { … };" you need "struct MYTYPE myvar;" in C.

  6. required says:

    And is there a reason not to use this?

    Yeah, there is. Apparently the compiler somehow includes the header file three times if you do.

  7. required says:

    My question is: why does the midl compiler choose different names on different occasions? I mean, you know, computers are fairly predictable things which, when you tell them to do X, Y, and Z will do X, Y, and Z the same way each time. They’re not Sirius Cybernetics Corporation elevators.

    [Look at some MIDL-generated files and it won’t take long before you figure out the algorithm. Hint: 0001, 0002… -Raymond]
  8. nksingh says:

    @required:

    How would the MIDL compiler choose the same name as last time?  The MIDL compiler probably chooses its current naming convention because no sane programmer would and therefore there’s a lower risk of collision.  If for instance, MIDL chose MIDL_type or something like that, a programmer just might have already used that name and we’d be up a creek without a paddle.

  9. sandman says:

    I don’t follow the point about it being a problem with the version control system – unless you are checking generated files into the VCS.

    Isn’t that generally considered bad practice?

  10. theorbtwo says:

    Ray’s last point is also a good example of the rule "generated files shouldn’t go in your version control system".

  11. nksingh says:

    @sandman, theorbtwo

    Windows is pretty big and there are tons of generated files that change seldom/never (after all, if they change all the time, it’s hard to maintain compatibility).  Hundreds of windows builds are produced every day from different groups, so it makes sense that if you’re going to make a change to a generated file that’s shared across parts of Windows (local generated files don’t matter) that you should bear the costs of generating the file rather than imposing it on every build machine out there.  Some of those rules might work for normal projects which build relatively quickly, but can’t work for Windows.

  12. An even stronger rule than that one is "never say never".  :-)

    I think that rule is more of a guideline for frequently generated files than something that someone should make as a hard, fast rule.

  13. Ian Johns says:

    Here’s a related article on the subject that helped me recently realize how to properly use tag names to forward declare structures :

    http://www.embedded.com/columns/programmingpointers/9900748

  14. Michiel says:

    The reason to use forward declarations instead of including the header is not just to break cyclic dependencies. It also speeds up compilation.

    A compiler faced with a #include "X.h" may need to look in a dozen locations, and it can’t really cache that information during a build (the "X.h" file may be the output of another build step)

    This isn’t really new; see Lakos’ Large Scale C++ Design.

  15. movl says:

    "Hundreds of windows builds are produced every day from different groups"

    Just wow! I take it Microsoft is really advanced these days in the field of quantum computing? Seriously, Mozilla for example is pretty slim in comparison but I can hardly build it overnight. To build NTOS+WOW+Shell+the rest in a day is incredible.

  16. required says:

    > it won’t take long before you figure out the algorithm

    Well, my point was “why is the algorithm such that it may generate different names on susequent runs”. Based on the example you gave (“typedef enum { … } XYZ;”) I don’t see why it could not generate (for example):

    typedef enum __MIDL___MIDL_itf_XYZ

    {

       …

    } XYZ;

    since duplicate type declarations are invalid AFAIK (i.e. declaring two types in your original file called XYZ would be an error anyway).

    IMO, an algorithm seems to have been chose which will cause this problem, it isn’t a problem inherent in generating names.

    [It gives different names because you changed the MIDL file between runs. Obviously if you change the input file, then it’s not unreasonable that the output file changes too! -Raymond]
  17. tim says:

    I’ve always wondered why the “tag” prefix is used. That is, why not just write it as:

    typedef struct XYZ {

    } XYZ;

    instead of:

    typedef struct tagXYZ {

    } XYZ;

    [I’ve always wondered whether people read the first sentence of the blog entry before posting a comment. -Raymond]

Comments are closed.