String comparisons against program output is not usually the best solution


A customer wanted to know whether the ICACLS program will be deprecated in Windows 10.

The reason is that they have a program that modifies file and directory permission, and the way it works is that the program runs the ICACLS program, then parses the output to see whether it succeeded. They are working on a new release and wanted to know what APIs they should be using, and whether their existing technique was still going to work.

As a general rule, programs are designed for human consumption, not programmatic consumption. (There are exceptions, like sort, or reporting tools that are designed to have their output parsed.) But if you're going to be tied to the exact number of spaces between the date and the file size, or the user's date and number formatting settings, or the letters A-c-c-e-s-s and d-e-n-i-e-d. then you're going to run into trouble.

If you're going to be manipulating file security, then you should be using functions like Set­Named­Security­Info, which are part of the formal and documented API surface of Windows.

I found this question surprising because it came from a German customer, so they were presumably doing string comparisons against "Zugriff verweigert", and all their customers were in German-speaking countries. Either that, or they told their customers to install the English version of Windows.

Bonus chatter: One of my colleagues recommends Programming Windows Security for those who want to understand more on the topic. Just passing along the recommendation; I haven't read the book myself.

Comments (33)
  1. Ben Voigt says:

    I haven't had to mess with multilingual support in any meaningful way, but it's my understanding that each application inherits the locale from its parent process, via the environment block, and therefore it's sufficient for the customer's software to set the en-US language before spawning ICACLS. Only the user's shell would actually have its locale directly affected by the OS version (and even then, it should be possible to override with a per-user registry setting).

    Or is inheritance of locale something that only exists on OSes made in locales other than Redmond?

    1. Even if the program inherits the en-US locale, if you didn't also install the English language pack, all of its string lookups will fail!

      1. exchange development blog team says:

        Isn't English included by default?

        1. Entegy says:

          No it is not. The base language of Windows can change depending on your region of the world.

  2. Karellen says:

    Aren't programs designed for human *and* programmatic consumption? Can't the client just pipe the output to nul, and figure out whether the command succeeded from its return value (%ERRORLEVEL%)?

    Unfortunate that /q only suppresses success messages. Perhaps in a future version repeating it (/q /q, or /qq) could suppress error messages too?

    1. Billy O'Neal says:

      As soon as you toss localization into the mix for human readers programmatic consumption typically goes out the window.

      1. Karellen says:

        Yes... hence throwing away (or suppressing) the textual output and using the return value instead in the programmatic use-case. (?)

        1. george says:

          I was going to mention the ERRORLEVEL, too, but you beat me to it. :) I'm not sure why it was so conveniently (and suspiciously) forgotten about in the article.

          In fact, I'd expect foreign language speakers would be the ones to know better than to do output parsing, because they will usually have to support English in addition to their own language. Most Europeans will prefer to use the English editions of Windows, because the localized version "sounds" quirky.

        2. morlamweb says:

          @Karellen: assuming that the command-line program even sets a proper errorlevel for the various return states of the program (and in that case, also assuming that the returns codes are documented!), parsing a number is a poor substitute for the rich set of data that can be typically be parsed from a program's output. In the customer case, it sounds like they needed more info from icacls than can be provided by a number, and so they tried to parse it's output.

          Another possibility is that they wanted to offload the details of Windows security to a dedicated and built-in program for the task. their program may have certain security requirements, and rather have their code call the APIs, they use icacls to do the heavy lifting. I don't agree that this is the right choice, but I can understand this line of reasoning.

  3. Billy O'Neal says:

    >Programming Windows Security

    It's sooooooo good :D

  4. John Watson says:

    I realize that this is a Windows-centric blog but programs designed for human consumption is more a Windows API monolithic philosophy as compared to the Unix philosophy of composition of smaller parts (https://en.wikipedia.org/wiki/Unix_philosophy). Having spent a lot of time lately in agile, continuous-delivery and automation work lately I can say that the prevailing Windows philosophy makes it much harder to work with unless you buy into the entire Windows ecosystem 100%.

  5. jader3rd says:

    In my experience when I'm trying to look up how to accomplish something via an API, nearly all of the search results end up being how do to it as an end user, and it can be very difficult to figure out what the API is.
    It would be really helpful if every TechNet entry describing a tool in Windows, also linked to the MDSN documentation for all/most of the API's used by that tool. I realize that's a herculean effort, but I don't know how else to solve the problem (without open sourcing the tools).

  6. AndyCadley says:

    Unfortunately this kind of behaviour is typical on Unix-like systems which generally don't give you any other choice, so tends to be a clunky behaviour carried across by people more used to working within its limitations.

    1. Karellen says:

      Is it? I'm trying to think of some examples of programs which expose functionality that is not available via some kind of C API, and I'm coming up short. Can you give an example (or two)?

      1. ChrisR says:

        A common case I can think of is getting the CPU usage. Not sure if there is any relevant API for that or not, but a lot of people recommend parsing the output of /proc/stat.

        1. Karellen says:

          Ah, of course! Thanks.

        2. Simon says:

          CPU usage (and other system-y sort of tasks) is one of those difficult cases, because it's not especially portable... every UNIX variant does things a little differently, so tools like "top" or "ps" have to be OS-specific, and their parameters and outputs are accordingly different.

        3. Cesar says:

          > Not sure if there is any relevant API for that or not, but a lot of people recommend parsing the output of /proc/stat.

          Parsing /proc/stat is the relevant API!

          (/proc is a virtual filesystem which exports information from the kernel. It's designed to be machine-parseable.)

          1. ChrisR says:

            Yeah I should have said people recommend parsing the output of top or some other tool [1]. Still, I think my example holds in the context of the Andy Cadley's post. People parsing the output of some file in /proc [2] [3] may think that to accomplish the same thing on Windows requires a similar solution, not realizing almost everything on Windows can be done via function calls instead.

            [1] http://stackoverflow.com/q/9229333
            [2] http://stackoverflow.com/q/3017162
            [3] http://stackoverflow.com/q/1420426

      2. Nico says:

        I can't say if it's very common today or not, but the Unix design philosophy has been based around it for a long time:

        > "This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface." (https://en.wikipedia.org/wiki/Unix_philosophy)

        I'm sure many programs provide a library call alternative to parsing text output (e.g., MagickWand for talking to ImageMagick with C) but there are also a lot of tools out there that parse text. How many scripts and tools are written to consume Git output? Is Git even translated to other languages?

        The world was so much easier when everything was 7-bit English :)

        1. As far as Git is concerned, most of its commands allow a parseable output instead of a localized one, usually by passing in --porcelain. There are other projects that offer a similar option when the default behavior includes localized text or would otherwise be difficult to ingest programmatically.

          1. AndyCadley says:

            Yeah but Git having the -porcelain options is a bit of a band aid "solution" to the problem. They knew developers were going to end up writing hacky text parsers on the output and at least vaguely attempted to contain the situation by at least trying to minimize the potential for breakages.

          2. Cesar says:

            @AndyCadley: With git it's different, the "plumbing" commands were written first, and were designed to be used in scripts, so their output was designed to be easily parsed. Those scripts later evolved into the more user-friendly "porcelain" commands.

            It's not a "vague attempt", it's a bona-fide API (for instance, "to add an object to git's database, call git-hash-object, the resulting object ID will be the only output on stdout").

        2. bmm6o says:

          The web has the advantage of the "content-accept" header, which is a fairly standard way for callers to indicate what kind of output they want: html, json, xml, etc.

        3. exchange development blog team says:

          IT WAS EVEN EASIER WHEN IT WAS 6-BIT EBCDIC.

          //JOB1 JOB (F00F),'GCC COMPILE SOURCE CODE',PRTY=10
          //COMPILE EXEC PGM=GCC.
          //INFILE DD DSN=SED.GLOB.SOURCE.C.INFILE,DISP=SHR
          //OUTIFLE DD DSN=SED.GLOB.A.OUT.OUTFILE,
          // DISP=(NEW,CATLG,DELETE),

          1. 12BitSlab says:

            EBCDIC is an 8 bit code that was introduced with S/360 to replace various 6 bit codes used by various IBM Big Iron boxes.

      3. exchange development blog team says:

        There are a whole range of Unix utilities that have specific command-line options for producing machine-readable output rather than the default human-readable output, google "unix machine readable output" for example. Since any nontrivial task on Unix systems tends to end up as a pile of shell script there's a big need for having output that can be fed into some shell regex to control which further actions are taken.

    2. Yuri Khan says:

      On Unix-like systems, programs are explicitly designed to produce output which can be fed as input to other programs. In many cases, it involves adding command line switches that specify formats. Example: ls(1) by default shows file size in bytes (machine-oriented) and modification time as locale-formatted date or time (human-oriented), but one can specify the -h switch to get human-readable “315K” or “4.2G” size, or, conversely, --time-style=+%FT%T%z to get a machine-readable timestamp with explicit time zone.

      Unix gives you a lot of choice. You can choose to write your own programs calling APIs or interpret documented data structures in well-known locations on the file system; or you can parse text output of ready-made programs.

      The back side of this coin is, of course, that program output format is part of its API and may have to be maintained.

  7. exchange development blog team says:

    >One of my colleagues recommends Programming Windows Security for those who want to understand more on the topic.

    What you'll learn there is development for Windows 2000. It's a decent book, but it was published 16 years ago, it's not at all current for stuff being done today unless you want to use the original low-level Windows NT way of doing things. For a better guide, look at some of Michael Howard's books, of which "Writing Secure Code for Windows Vista", while not entirely current, is still probably a better choice than Programming Windows Security.

  8. Ray Koopa says:

    Knowing from myself, "top quality" software doing such things comes from Germany... I once had to make a German software product multi language capable, and things like this were just one of the problems (they abused the output of the DIR command to get files and sizes, and they were using VB.NET. Seriously? Needless to say I quit that job after I saw such things the tenth time).

  9. alegr1 says:

    A classic example when wrong localization killed legitimate case of string comparison: mciSendString API.

  10. Dave Bacher says:

    It used to be common (2002 time frame) that I'd get off an Airplane in Spain or Mexico, walk to a bank, log in to the computer I needed to service -- and they'd all be running the US English version of Windows on every desktop, regardless of the actual target language.

    The stated reason? Many programs didn't work correctly otherwise, and when they'd call support -- the support personnel could not cope with the translated messages.

Comments are closed.

Skip to main content