If you wished a language supported the preprocessor, you know, you can fix that


A customer had the following question about the message compiler, something that I had noted almost nobody uses. Well how do you do, we found somebody who actually uses it. Anyway, the question went like this (paraphrased, as always):

Can I use symbolic constants in my .mc file? For example, I have a message file that goes like this:

SymbolicName=ERROR_XYZ_TOO_LARGE
The XYZ parameter cannot exceed 100.
.

SymbolicName=ERROR_ABC_TOO_SMALL
The ABC parameter must be at least 1.
.

SymbolicName=ERROR_CANT_COMBINE_ABC_WITH_XYZ
You cannot specify values for both ABC and XYZ.
.

I have symbols defined in a header file #define MINIMUM_ABC_VALUE 1 and #define MAXIMUM_XYZ_VALUE 100 that I, of course, have to keep in sync with the error messages. One way to do this is to change the messages:

SymbolicName=ERROR_XYZ_TOO_LARGE
The XYZ parameter cannot exceed %1!d!.
.

SymbolicName=ERROR_ABC_TOO_SMALL
The ABC parameter must be at least %1!d!.
.

And in my function that prints error messages, I can insert these magic parameters:

error = DoMyThing(...);

if (error != ERROR_SUCCESS) {
 switch (error) {
 case ERROR_ABC_TOO_SMALL:
  Insertion = MINIMUM_ABC_VALUE;
  break;
 case ERROR_XYZ_TOO_LARGE:
  Insertion = MAXIMUM_XYZ_VALUE;
  break;
 case ERROR_CANT_COMBINE_ABC_WITH_XYZ:
  Insertion = 0; // not used
  break;
 ... repeat for other error messages...
 }
 DWORD_PTR Parameters[1] = { Insertion };

 FormatMessage(FORMAT_MESSAGE_ARGUMENT_ARRAY ...
     ..., error, ..., (va_list*)&Parameters)...
}

This is obviously a rather high-maintenance approach. Is there some way I could just write, say,

SymbolicName=ERROR_XYZ_TOO_LARGE
The XYZ parameter cannot exceed {MAXIMUM_XYZ_VALUE}.
.

SymbolicName=ERROR_ABC_TOO_SMALL
The ABC parameter must be at least {MINIMUM_ABC_VALUE}.
.

and have the message compiler do the substitution? It would be great if it could even take the values from my header files.

This is a case of standing right next to the answer and not even realizing it.

There's no law that says that you're not allowed to use any other tools. It so happens that the preprocessor is a handy tool. If you want the preprocessor to run over your message files before they go into the message table, then why not run the preprocessor over your message files before they go into the message table?

#include "qqlimits.h" // pretend the program's name is "qq"

...

SymbolicName=ERROR_XYZ_TOO_LARGE
The XYZ parameter cannot exceed MAXIMUM_XYZ_VALUE.
.

SymbolicName=ERROR_ABC_TOO_SMALL
The ABC parameter must be at least MINIMUM_ABC_VALUE.
.

SymbolicName=ERROR_CANT_COMBINE_ABC_WITH_XYZ
You cannot specify values for both ABC and XYZ.
.

Give this file a name like, say, qq.mcp, and add a rule to your makefile:

qq.mc: qq.mcp qqlimits.h
  cl /EP qq.mcp >qq.mc

Make your changes to qq.mcp, and when you build, the makefile will preprocess it and generate the qq.mc file, which you can then compile with the message compiler just like you were doing before.

Comments (24)
  1. Anonymous says:

    Hmm…The C preprocessor can be a bit tied to C/C++ language syntax for that to be generally useful. ISTR problems with quote (‘) and double-quote (“) marks needing to balance out properly, and macros within quotes of either kind being ignored, as well as some other weird behaviours I didn’t expect (because I wasn’t thinking with my C hat on) when I tried this.

    m4 (or another general text substitution/expansion tool) might be a better fit. Not sure if MS have an m4 implementation (as part of SFU maybe?) but I think that GNU m4 is available for Windows.

    [Forest vs trees. Feel free to use whatever preprocessor works best for you. The point remains: If you want a preprocessor, use a preprocessor. -Raymond]
  2. Anonymous says:

    I must be in the minority then – I started using mc for something in 2007!

  3. Anonymous says:

    Was this post inspired by Googlewhacking?  You’ve written a plausible post about two tools (message compiler and make) that probably aren’t being used together by anyone.

  4. Anonymous says:

    Will this compile in message comiler, even though the preprocessor includes a header file into the mc file?

  5. Anonymous says:

    @asdasd > Good point. You’d have to be careful to ensure that the include file only contained preprocessor directives, and not, for example, enum definitions, global (const) variable declarations, function declarations, etc…. Otherwise that could cause syntax errors in the mc file.

    I think comments would be OK, as I’m pretty sure it’s the preprocessor’s job to strip out comments. And there might be an option/switch to turn comment elimination on/off. But that’s another gotcha – you have to be careful about comment markers in text you want to keep. Fortunately they’re not that likely to crop up, unless they’re the messages for a C preprocessor/compiler! :-)

  6. Anonymous says:

    The C preprocessor is surprisingly useful for a variety of languages. We use it to preprocess Javascript, for example, which cuts down on maintenance quite a bit. (Try to create a Javascript assert() that has both a clear syntax and zero non-debug overhead without /bin/cpp.)

  7. Anonymous says:

    Raymond, Daniel, and anyone else crazy enough to do this:

    The C preprocessor is specified only to work on C. When you run it over something that isn’t C you are basically saying "Oh I do I hope I introduce hard to find bugs into this system, that is sure to delight future maintenance programmers".

    Don’t do it. If you want a general purpose macro language, use one. But then you’re back into another of Raymond’s pet issues about tool dependencies. The customer’s set of tools probably doesn’t yet include m4. If you’re going to make an exception for m4, why not make an exception for Perl? Or just use a better OS platform with nicer tools ? Or give it all up and become a goat herder ? With a suitably wide frame of view all these are valid answers, right?

  8. Anonymous says:

    Nick, there’s a saying: the first time it’s a hack; the second time, it’s a trick; the third, a well-established technique.

    The C preprocessor is standardized and robust. That is was designed to work with the C language has no bearing whether it actually /does/ work with other languages. In using it that way, you’re not relying on implementation details subject to arbitrary change, but precisely standardized behavior. There’s no harm in that.

    The tool dependency issue for CPP is also a non-issue because practically every development environment includes a copy. It’s universal.

    Also, in my book, m4 is a perfectly reasonable build dependency. Many projects (such as Mozilla) use ports of Unix tools to build even under win32. There’s no shame in that.

    As for your argument that using CPP will confuse maintenance programmers: come on, the idea that we should never do anything that might confuse anybody was the central thesis behind COBOL. We’ll never make any progress that way. The advantages of preprocessing far outweigh the very slight learning curve. Any programmer worth his salt will easily be able to cope with it.

    (Unless you’re hiring, that is, straight out of a school that teaches C pointers by explaining (and I quote verbatim), "the star means it’s like Java". If you hire such people, may God have mercy on your code.)

  9. Anonymous says:

    Yeah, we use this on our web configs for multiple versions of the web config depending on what we are doing. Handy.

  10. Anonymous says:

    > The C preprocessor is specified only to work on C.

    Yeah, think of all the idiots who use it with C++ and IDL. What were they thinking?

    A true C preprocessor knows nothing about C in the first place, that’s the compilers job. So exactly how it’s supposed to only work on C I don’t know.

  11. Anonymous says:

    @Nick Lamb

    are you kidding?  The C preprocessor is heavily used to pre-process many things. I can’t see danger in using it.  All it does is strip C-style comments and respond to #-sign commands  

    There is no simpler and well-understood when you want to use defines, includes and comments in some kind of script or control file that is processed by a utility or a parser you wrote.  

    What would be dumb is using some other "macro language" that no one knows, and that requires some tool no one has.

  12. Anonymous says:

    I’m also using the message compiler. To localize my company’s product, there is the .mc file, a dozen .rc resource files, files of global LPCTSTRs generated by automatically stripping out string literals from the C++ source and then compiled into resource .dlls, and C# .resx files generated from .txt files generated by stripping out literal strings from C# code.

    My big task was to take all these sources of translatable strings and generate Excel files to send to our translators and then use those Excel files to create translated resouece binaries. I currently loathe code pages.

    About the C preprocessor, if there aren’t macros to expand or directives to process, it doesn’t go arbtrarily changing the source code. And if you’re doing something vile like in Raymond’s "A rant against flow control macros" http://blogs.msdn.com/oldnewthing/archive/2005/01/06/347666.aspx , then you deserve what you get. And as Raymond says, you can write a custom preprocessor, or even a batch file driving a regex-replace utility.

    — T

  13. Anonymous says:

    >> You’ve written a plausible post about two tools (message compiler and make) that probably aren’t being used together by anyone.

    I can happily confirm they *are* still being used together.

    I wouldn’t be surprised if developers brought up only using IDEs have a mental road block to doing anything that isn’t on the menu options.

  14. Anonymous says:

    The open-source folks use the C preprocessor a lot – preprocessing everything. Especially anything that can run through gcc, has a language variant tha understands preprocessor commands.

    Anyhow, it encourages clean code and symbolic constants – a good thing. You put all your error defines in a header, and now it’s preprocessable with anything that might need it – assembly, HTML, README, what have you.

    Defining things once is a good thing – less to maintain and less to worry about constants getting out of sync.

    And less reliance on arbitrary commands to do the same thing.

    E.g., in assembly, the assembler often has directives to include a file. Or you could do #include "defs.inc".

    Ditto when doing conditional compiles – the syntax differs, but if you can use #ifdef, the code seems more familiar… plus the C preprocessor can do a lot more than what other tools often allow…

  15. Anonymous says:

    I use php to preprocess custom format xml (describing UI) into cpp a lot like code-behind files in c# (aided with partial classes). Only the IDE integration is lacking. I wish there was a way to similary view these "to-be-preprocessed" files as sub-items there.

  16. Anonymous says:

    “The C preprocessor is standardized and robust. That is was designed to work with the C language has no bearing whether it actually /does/ work with other languages. In using it that way, you’re not relying on implementation details subject to arbitrary change, but precisely standardized behavior.”

    Nope. Only that fraction of the implementation which matters to a C compiler is standardized. The C preprocessor, for example, need not care about the difference between TAB and other kinds of whitespace. Did the text you were pre-processing care? Shame, you’ve just introduced a hard to find bug.

    “A true C preprocessor knows nothing about C in the first place, that’s the compilers job. So exactly how it’s supposed to only work on C I don’t know.”

    Again wrong, the C preprocessor is required to use C’s tokenisation. You probably unconsciously rely on this all the time. If you mention the name of a macro in a constant string for example, you are completely unastonished that CPP ignores it – that’s not a symbol it’s just part of a string. See?

    “All it does is strip C-style comments and respond to #-sign commands”

    And wrong a third time.

    If you don’t know, don’t make up an answer and then insist you’re right no matter what. That works for politicians but it doesn’t work in programming.

  17. Anonymous says:

    @porter – the preprocessor is heavily dependent on a particular somewhat idiosyncratic tokenization model (do you know why 0x1E-2 isn’t 0x1C?) and also if you #include any files it will emit #line directives for the compiler to pick up. It will not necessarily work well for languages that are not designed to cope with this. That a few languages other than C either are designed to work with it or happen to coincidentally work with it does not change this fact.

    In other words – yes it is well-defined standardized behavior, but that standard also includes behavior you may not want. So by all means try it – it might work. Just cross your fingers, and don’t be surprised if it doesn’t.

  18. Anonymous says:

    > also if you #include any files it will emit #line directives for the compiler to pick up

    "#line" can be turned off.

    I see what you are saying, for files that have a format which abides by the "cpp" processing rules, eg C, C++, def files, IDL files, C# and Java etc then "cpp" is appropriate.

    Where you have more free form formats, especially including unquoted human text it’s asking for trouble.

  19. Anonymous says:

    Alas even "cpp" can’t make up for the fact that Visual C++ 2008 Express omits MC.EXE.

  20. Anonymous says:

    And wrong a third time.

    >

    If you don’t know, don’t make up an answer and

    then insist you’re right no matter what. That

    works for politicians but it doesn’t work inprogramming.

    @Nick Lamb: that insults our two decades experience with the C pre-processor.  

    You’re "making up" an unsubstantiated case against the C PreProcessor with allegations introduce ‘hard to find bugs’. Where are examples?

    I’ll tell you what "doesn’t work for programmings". It’s writing your own macro/include/conditional compilation system instead of using the one that’s there, debugged and well-understood.  Or worse, not having it all and then creating more work and maintenance for everyone in the team.

    Hard-to-find bugs?  how about those cause by copy/paste or having to maintain different identical files because a stubborn dev didn’t want to use CPP to feed another tool.

    Or even doubly-worse, inflicting on the team a perl script that pre-processes these files in a way that no one other than the author fully understands.

    We use the CPP for dozens of different type of custom or standard control file or scripts. Everyone does. Do you put numeric constants in your .rc files to avoir using the pre-precessor?  What about IDL files?

    I’m absolutely outraged that you would write ‘those of you crazy enough to try this’. Everyone uses the C processor on non-C code.  Every time you compile your Windows app resource you do.

    Should you use it for the message compiler too?  Naturally.  There is no fault that can be found with that suggestion.

  21. Anonymous says:

    >“All it does is strip C-style comments and respond to #-sign commands”

    And wrong a third time.

    If you don’t know, don’t make up an answer and

    then insist you’re right no matter what. That works for

    politicians but it doesn’t work in programming.

    FYI: What I said is exactly what the C processor does.  Do you know what the CPP? It’s a separate tool that runs before the C Compiler.  It knows nothing of the C language syntax.

    The C Pre Processor is compatible with any ACSII file that doesn’t give need to use the "#" pound sign, comments, or give a meaning to double quotes other than being string.

    If whatever file you intend to process isn’t compatible with that, the option of using CPP didn’t even come up.

  22. Anonymous says:

    Wow, I think I made Ulric pretty mad, he’s lost the ability to write coherent sentences and spell words.

    Ulric, take a text editor, and create a nice simple text file that contains just a tab (U+0009) character and then the word FILE immediately followed by a full stop and the three letters DOC.

    Now run it through your preprocessor†, and see what yours does to the tab. In a lot of preprocessors your U+0009 tab will mysteriously turn into a U+0020 space, but not all of them.

    Next, try defining FILE to a number, let’s try 1066. Hey, my preprocessor added some more whitespace after the number. It turns out to be convenient to avoid tokenisation problems with numbers in the next phase of compilation, so your preprocessor probably does this too. But not with every definition, ordinary words are left alone. On the other hand, if it’s not actually an ISO C preprocessor, it might not add whitespace at all. More unpredictability.

    Neither of these little "surprises" is on your list. There are dozens more. Allow me to repeat myself: The C preprocessor is for C, if your input language has its own preprocessor, use it, if it doesn’t, use a general purpose macro processor like m4.

    † Those at home feel free to play along with different CPPs.

  23. Anonymous says:

    @porter: That doesn’t matter — a lot of those sorts of tools are missing from there.  But it’s in the Windows SDK.

  24. antoineL says:

    I am not sure I understand the positions in this flamewar: while cpp is independant from the C semantic, it does require something pretty akin from C syntax, like for example escaping, " and ‘ used for string litterals, /* */ for comments, handling of whithespaces, # for metacommands, 0x meaning hexadecimal overridding etc.

    As a result, cpp disqualifies to be used with Intel-syntax assemblers, including both MASM and NASM; or Makefiles; or the special syntax with FILE.DOC Nick described just above.

    But the other part of the same point is that several language were designed with that in mind; and it was also obviously done so for RC, MIDL, … and MC! So I do not see why it could be a bad idea not to rely on this.

Comments are closed.