Poor man’s comments: Inserting text that has no effect into a configuration file


Consider a program which has a configuration file, but the configuration file format does not have provisions for comments. Maybe the program has a "list of authorized users", where each line takes the form allow x or deny x, where x is a group or user. For example, suppose we have access_list that goes like this:

allow payroll_department
deny alice
allow personnel_department
allow bob

This is the sort of file that can really use comments because people are going to want to know things like "Why does Bob have access?"

One way of doing this is to embed the comments in the configuration file in a way that has no net effect. You can do this to add separator lines, too.

deny !____________________________________________________________
allow payroll_department
deny !alice_is_an_intern_and_does_not_need_access_to_this_database
deny alice
deny !____________________________________________________________
allow personnel_department
deny !____________________________________________________________
deny !temporary_access_for_auditor
deny !see_service_request_31415
deny !access_expires_on_2001_12_31
allow bob

Assuming that you don't have any users whose names begin with an exclamation point, the extra deny !... lines have no effect: They tell the system to deny access to a nonexistent user.

Sometimes finding the format of a line that has no effect can take some creativity. For example, if you have a firewall configuration file, you might use URLs that correspond to no valid site.

allow nobody http://example.com/PAYROLL_DEPARTMENT/--------------------
allow alice http://contoso.com/payroll/
allow nobody http://example.com/PURCHASING_DEPARTMENT/-----------------
allow bob http://contoso.com/purchasing/
allow nobody http://example.com/SPECIAL_REQUEST/-----------------------
allow ceo https://www.youtube.com/

Of course, these extra lines create work for the program, since it will sit there evaluating rules that will never apply. You may have to craft them in a way so that they have minimum cost. In the example above, we assigned the comments to a user called nobody which presumably will never try to access the Internet. We definitely didn't want to write the comment like

allow * http://example.com/PAYROLL_DEPARTMENT/-------------------------

because that would evaluate the dummy rule for every user.

If you are willing to add a layer of process, you can tell everybody to stop editing the configuration files directly and instead edit an alternate file that gets preprocessed into a configuration file. For example, we might have access_list.commented that goes

//////////////////////////////////////////////////////////////////
allow payroll_deparment

deny alice // payroll intern does not need access to this database.

//////////////////////////////////////////////////////////////////
allow personnel_department

//////////////////////////////////////////////////////////////////
allow bob // Temporary access for auditor, see SR 31415. Expires 2001/12/31.

Everybody agrees to edit the access_list.commented file, and after each edit they run a script that sends the file through the C++ preprocessor and puts the result in the access_list file. By using the C++ preprocessor, you enable features like #include directives and #define macros.

Comments (34)
  1. Spike says:

    Or, something I learnt from Reddit yesterday.  Inserting a URL into C++ code?

    {

      int i = 0;

      http://www.microsoft.com

      i += 1;

    }

    This is perfectly legal.  The compiler parses it as a label followed by a comment.

  2. Rick C says:

    Yes, but you can only do that once per…compilation unit, I'd assume.  Otherwise you get "label redefined" errors.

  3. Rick C says:

    "Expires 2001/12/31" spotted in a couple places.

    Wow, Raymond's article queue's gotten REALLY deep.

    [Nope, that was just an Easter egg. I wouldn't be surprised if comments like that were present in production. -Raymond]
  4. skSdnW says:

    I assume ;comments in .ini files and ::labels in batch files started life as hacks that are now de facto standards and "supported".

  5. Adam Rosenfield says:

    @Rick C: You can do that once per function, not once per translation unit.

  6. tialaramex says:

    Please don't use either the C or C++ preprocessors as a "poor man's macro facility".

    These pre-processors are very carefully defined to work correctly on C (or C++ as the case may be) and have a surprising number of weird corner cases that you or the maintenance programmer will end up cursing when they interact with your non-C language inputs in a surprising way.

    Choose a general purpose macro pre-processor. m4 is a perfectly nice choice if you have a POSIX system, but many of the readers of this blog don't, so maybe choose something else.

  7. Gabe says:

    I first saw the headline and thought this was going to be about the origin of DOS's REM command: DOS originally had no way to denote comments, so people used to writing BASIC defaulted to using REM at the beginning of comment lines.

    Of course DOS didn't know what to do with REM, so it would look for a REM command and fail, effectively making the comment line a no-op. After a few years of this, Microsoft codified the behavior by making a REM command that simply does nothing, allowing you to write comments that start with REM.

  8. Joshua says:

    When using the C preprecessor on non-C code, should use the traditional preprocessor rather than the ANSI one as the ANSI one barfs something that can't get by the C lexer. The option for gcc is -traditional-cpp; not sure what it is for MSVC.

  9. The problem with this approach is that it relies on the traditional notion of "being liberal with what we accept and conservative in what we produce."  If the ACL is being parsed as a list of commands, for example, or it uses a regular expression to parse users, and the resolution of users is done each time a line is read, then the program may fail to load, or may fail to parse the ACL altogether, leaving it in some indeterminate state.

    The preprocessed file situation greatly improves the problem from the editing side, but introduces other brittleness.

    [That's why you have to be careful to structure your comments as things that are syntactically legal (but have no effect). -Raymond]
  10. Lev says:

    Is it so hard to add a syntax for comment lines?

  11. @Lev: Particularly if you don't have source code access to the original program, yes.

  12. And that happens rather a lot in commercial environments.  That and it's maintained by a third party and getting comments added would be a billable feature request which will never get approved by people who have to sign off on it, not when you can produce a work around for (as far as they're concerned) free.

  13. Dan Bugglin says:

    @Gabe Interesting, I thought at least QBASIC used ' for line comments.  But maybe I am just thinking of VB, haven't used QBASIC in a while.  Plus you are probably talking about an older BASIC variant.

    Every DOS I've used displayed an error for commands it couldn't find.  And while nowadays in NT you can use 2>nul to hide the error, you couldn't redirect stderr back in the DOS days.  So I'm a bit confused how that actually worked.

    In the spirit of this discussion, you don't have to use REM for batch file comments:

    ::Copyright 2014 Initech, all rights reserved.

    @echo off

    : denotes labels.  A bonus is that labels are never echoed to the console, even with "echo on".  Putting :: is a good way to make a label you won't accidentally reference later.

  14. Christian Vogel says:

    This first suggestion is just revolting, seriously. Please, no one ever do this.

    The second one is fine as long as you are careful with the switches to your c-preprocessor. It might add some "#pragma", "#file" or other things to its output file which are useful for good error reporting in the c-compiler, but will confuse any program that dumb like the example Raymond presented/made up.

    If you ever have to deal with such idiocy, do yourself a favor and write a small perl-script (windows-guys probably will choose vbs, powershell, or similar): This script then can also do additional sanitation of the input, prevent "allow all" conditions, (in this example:) check that users actually exist, … Again, program that dumb as in the example (and those exist in the wild, I know…) will need any help they can get.

  15. Rick C says:

    @Adam Rosenfield, I didn't try to make an exhaustive test.  The … was supposed to indicate that.  In any event, generally a feature you can only use once isn't all that good.

    @tialaramex, I think (a port of) m4 is available for Dos/Windows.  Maybe via DJGPP?  I thought I saw it at some point years ago.

  16. mikeb says:

    >> m4 is a perfectly nice choice if you have a POSIX system, but many of the readers of this blog don't, so maybe choose something else. <<

    And therein lies the problem.  You've run into the reason why people on Windows still use the cmd batch processor far, far more often than modern scripting languages like Powershell or Python.

    The C preprocessor might have weird corner cases, but it'll handle simple stuff (like include these lines but not those lines) pretty nicely.  And it's available (often by default, but easily otherwise) on pretty much every system.  And it's widely understood how to use it – at least for the simple stuff.

  17. mikeb says:

    Douglas Crockford removed comment support from JSON so they wouldn't be used for metadata/directives – plus.google.com/…/RK8qyGVaGSr . When I've wanted to have comments in JSON files, I've used Raymond's suggestion of adding an otherwise ignored data item (named something like "__comment") to hold a comment string.

    Crockford suggests just using javascript comment syntax and run the file through the JSMin processor to strip them out before parsing the JSON file, similar to Raymond's suggestion of using the C preprocessor..

    I'd rather that JSON simply support a comment syntax, but there it is.

  18. Depending on the application and its sensitivity, I've seen two other forms of remarks:

    1. Add "REMARK Alice doesn't need access to this database", when the interpreter of the config file ignores unknown instructions like REMARK.

    2. Add "deny alice // Alice doesn't need access to this database" when the interpreter only interprets the first token after "allow" or "deny".

    I myself never do this. These are hacks by definition and can break. I wrote a program called Compile that takes a file in the form of "Filename.ext.uncompiled" and writes out "filename.ext", removing all lines that begin with the comment sequence. (Two 0xFF characters)

    And I hate _ character.

  19. laonianren says:

    This is one of those things that, after years of programming, becomes second nature: if you're defining the format of a textual data file you include syntax for comments.

    Similarly, all binary formats get a version number.

  20. Azarien says:

    @Gabe: except that "rem /?" does SOMETHING, so your comment cannot start with "/?" ;-)

    and "rem /? > some_existing_file" may be even dangerous.

  21. Anonymous says:

    FIND /V ";" original > preprocessed

  22. John Elliott says:

    @The MAZZTer: ' is accepted as a comment in Cassette BASIC on an IBM 5150, and in BASIC-80 5.2 under CP/M.

    The CP/M command processor treats ";" as a comment character. In CP/M 2 it ignores input beginning with any character that isn't a valid first character for a filename ( = _ . : ; < > ) but in CP/M 3 it explicitly checks for ";". I'm guessing this behaviour didn't make its way into PCDOS 1, hence the need for REM and/or ::

  23. Jordan says:

    As someone who works on a C++ compiler, I would advise you /not/ to use a C++ compiler to preprocess random files, because it still has to tokenize as valid C++. No unbalanced apostrophes, for example. Use m4 or something instead.

    Also get off my lawn.

  24. Gabe says:

    When I want comments in my JSON file, I just use eval to parse it instead of a JSON parser.

  25. Klimax says:

    @Rick C:

    MingW, Cygwin also native version

    gnuwin32.sourceforge.net/…/m4.htm

    (IIRC)DJGPP is DOS only and won't run in x64 Windows.

  26. Lev says:

    @Gabe

    But you have to make sure the file comes from a trusted source, otherwise an attacker can insert code with side effects.

    Btw, in Matlab, str2num has precisely this problem: It calls eval.

  27. Neil says:

    Preprocessed XML, anyone?

  28. amroamroamro says:

    @Lev:

    which is why you should use str2double instead in MATLAB

  29. @Klimax says:

    I knew there was a version, and I was (sort of) even right, since both DJGPP and Mingw are ports of GCC. :)

    You're right about djgpp–it's 16-bit so won't run.  Fortunately that doesn't matter, what with m4 coming from elsewhere.

  30. Gabe says:

    Lev: Configuration files generally come from the correct side of the airtight hatchway, making eval a legitimate way to parse them. Here's a sample usage:

    "passwordPolicy": {

       // expiration is in ms

       "expiresAfter": 90 /* days */ * 24 /* hours */ * 60 /* min */ * 60 /* sec */ * 1000, /* ms */

  31. Not Norman Diamond says:

    @Jordan: Raymond didn't suggest using the C++ compiler, he suggested using the C++ pre-processor.

  32. Brian_EE says:

    I think that all of you who are discussing the various ways of pre-processing the file are missing a point. What if the person who is responsible for adding/deleting/changing the permissions (in this example Raymond gave) is a non-technical person? By layering on a whole bunch of steps/tools you add more probability of error into the process.

    Consider this example (Boss, Jim, to secretary, Kathy, who adds people to the file)

    Jim: "Kathy, Can you add David to the database access?"

    Kathy: Adds David, saves file, forgets to pre-process

    David: "Jim, I can't access the database"

    Jim: "Kathy, did you add David to the database"

    Kathy: "Yes. I don't know why it's not working"

    Programmers often think like programmers and don't think like non-technical users of their products.

  33. !see_service_request_31415 says:

    Why does Bob get access, but not me?

  34. Gabe says:

    Brian_EE: Even technical people get multistep processes wrong. When I had to manage sendmail once upon a time, I would have to edit the /etc/aliases file once every few months. Of course sendmail never knew about that file; it just knew about /etc/mail/aliases.db.

    So about half the time I changed an alias I would forget to run "newaliases" (to turn the aliases text file into the aliases.db binary file), and I would get a call about why the change didn't work.

    Now whenever I make a system that requires a preprocessing step, I make the system recognize the the source file has changed and automatically run the preprocessor.

Comments are closed.

Skip to main content