The Windows command line is just a string…


Yesterday, Richard Gemmell left the following comment on my blog (I’ve trimmed to the critical part):

I was referring to the way that IE can be tricked into calling the Firefox command line with multiple parameters instead of the single parameter registered with the URL handler.

I saw this comment and was really confused for a second, until I realized the disconnect.  The problem is that *nix and Windows handle command line arguments totally differently.  On *nix, you launch a program using the execve API (or  it’s cousins execvp, execl, execlp, execle, and execvp).  The interesting thing about these APIs is that they allow the caller to specify each of the command line arguments – the signature for execve is:

int execve(const char *filename, char *const argv [], char *const envp[]);

In *nix, the shell is responsible for turning the string provided by the user into the argv parameter to the program[1].

 

On Windows, the command line doesn’t work that way.  Instead, you launch a new program using the CreateProcess API, which takes the command line as a string (the lpComandLine parameter to CreateProcess).  It’s considered the responsibility of the newly started application to call the GetCommandLine API to retrieve that command line and parse it (possibly using the CommandLineToArgvW helper function).

So when Richard talked about IE “tricking” Firefox by calling it with multiple parameters, he was apparently thinking about the *nix model where an application launches a new application with multiple command line arguments.  But that model isn’t the Windows model – instead, in the Windows model, the application is responsible for parsing it’s own command line arguments, and thus IE can’t “trick” anything – it’s just asking the shell to pass a string to the application, and it’s the application’s job to figure out how handle that string.

We can discuss the relative merits of that decision, but it was a decision made over 25 years ago (in MS-DOS 2.0).

 

[1] Yes, I know that the execl() API allows you to specify a command line string, but the execl() API parses that command line string into argv and argc before calling execve.

Comments (30)

  1. Dave says:

    > it was a decision made over 25 years ago (in MS-DOS 2.0).

    I am pretty sure that was actually inherited from CP/M; the (MS-,PC-,DR-,Q-)DOS COM file format used the same memory layout so that the "thousands" of existing CP/M programs could be ported over more easily. It also explains the 127-character limitation for DOS command lines that still exists today; the command tail (not including the command name) started at 0x81 and the program loaded at 0x100.

  2. Dave, that’s entirely possible.  OTOH, for OS versions before 2.0, launching a new program was actually a function of command.com – there was no OS API for launching a new process.

  3. Mike Dimmick says:

    It is of course worth noting that if you link your C program with mainCRTStartup or wmainCRTStartup, the C runtime decodes into argc/argv and calls your main or wmain function respectively.

    It’s unusual, but not forbidden, for a Windows application (i.e. an application that registers and uses its own window classes, rather than a console) to do this. The bit governing whether or not a console is created for the application is an independent setting, set in the PE header by the linker (/SUBSYSTEM:CONSOLE vs /SUBSYSTEM:WINDOWS). Visual Studio sets its defaults so console applications use (w)main, and Windows applications use (w)WinMain, but it’s not required. I don’t know what Firefox does but I’d take a guess that they might be using (w)main for portability.

  4. Mike: Absolutely.  I actually had a paragraph in the post describing that but edited it out (because I thought it rendered the narrative flow awkwards).

  5. Steve says:

    And even if you use WinMain, you can still make use of the C runtime’s argument decoding by accessing __argc and __argv.

    In other words, the following are all completely orthogonal to each other:

    * Whether you are /SUBSYSTEM:CONSOLE or /SUBSYSTEM:WINDOWS

    * Whether your entry point is mainCRTStartup (calls main) or WinMainCRTStartup (calls WinMain)

    * Whether you access arguments via __argc/__argv or as a raw string from GetCommandLine

    * Whether your program creates a GUI or calls console APIs (or both)

  6. Steve, at the Win32 level, all of that is irrelevant.

    Win32 applications get their command line from the GetCommandLine() API, what they do with it after that is their business.  The entrypoint to the process may do preprocessing (mainCRTStartup or WinMainCRTStartup) or it might not.  But the key takeway is that in the Win32 model, command line processing is handled by the child process, while in *nix, command line processing is handled by the parent process.

  7. Yuhong Bao says:

    This manual provides info on how programs were loaded in early versions of DOS. Be warned that most of the numbers are in decimal, NOT hex:

    http://www.patersontech.com/Dos/Docs/86_dos_prog.pdf

  8. Alun Jones expanded on this on his blog back when the fires were still raging:

    http://msmvps.com/blogs/alunj/archive/2007/07/23/firefoxurl-part-ii.aspx

  9. Adam says:

    You have a bit of an odd phrasing here which threw me for a loop.  ("In *nix, the shell is responsible for turning the string provided by the user into the argv parameter to the program.")

    I’d say the caller is responsible, rather than the shell.  A shell is only involved if you’re in a shell, or if your code calls system(), or popen(), or some other hugely dangerous system call, like pwnme().

  10. Mitch Denny says:

    When you use an obsolete command-line, you get obsolete command-line parsing. PowerShell is fast becoming the new command-line or Windows (it is designed to be). With it the arguments are parsed by the shell.

  11. Adam: You’re right, my bad.

    Mitch: What powershell hands to it’s applets is irrelevant.  I’m describing the Win32 command line handling semantics.  Powershell doesn’t use Win32 command line semantics when interacting with applets, that’s fine – I did say that this was an implementation decision.

    If powershell launches a Win32 process, it passes the arguments as a single string, because that’s the way that Win32 works.  Powershell can’t change it, because it’s just a shell.

  12. MM says:

    Theorem :

    A subset of a true phrase is not necessarily a true phrase.

    Proof :

    "PowerShell is fast becoming the new command-line".

    -> Probably true.

    "PowerShell is fast"

    -> AHAHAHAHAH! :(

    Really.. I don’t know how anyone could use it given its speed… :(

  13. Rosyna says:

    The "problem" is that POSIX functions (except for deprecated, highly-insecure functions like system()) take arguments explicitly as arguments. It will never take a series of characters to mean something other than it means.

    Windows, on the other hand, tries to find meaning in a string. Meanings which may be very unwanted and/or can be horribly insecure. There’s a reason why system() is so hated… it has the same problems as the GetCommandLine API and things have. system() gives special meaning to a string.

    In this case, Windows should put all of these functions in the banned functions file (like strcpy) and make new, explicit APIs that treat process paths and arguments as very, very different things. Security should trump backcompat if the old methods are clearly of a very borked design.

  14. Richard says:

    At the ISO C89 level, the main() function has well-defined arguments, and there is a de facto method for escaping parts of the command line in order to present those arguments to a program using the system-supplied C runtime. The firefoxurl vulnerability came about because there doesn’t appear to be any way for an URL handler to take advantage of that encoding — which, given that it takes its argument from an URL and uses it to form a command-line, is quite inexcusable. Ultimately, as Rosyna says, this is a threat in CreateProcess itself.

  15. Rosyna, I’m not sure that I understand the difference between the two paradigms, or why one is better than the other.

    In one paradigm, an application (the shell) parses a string and converts it to arguments.  In the other paradigm, an application (the application being called) parses a string and converts it to arguments.

    The only significant difference is that in the *nix paradigm, the caller doesn’t have to interpret the intent of the parent – but there’s also an opportunity for mischief there, because the parent can produce strings that are impossible for the shell to create (and thus may not have been tested by the application).

  16. R Samuel Klatchko says:

    The reason the *nix paradigm is better is that it doesn’t *require* the command line to be parsed.  A program can invoke another program and know exactly how it will see the arguments.  This is important when some of the arguments may come from an external source.  If code prompts for a URL and wants to pass it off to a browser, it is easy on *nix to guarantee the browser will see the URL as a single argument.

    With Windows, that is very difficult to do.  Because the command line must get parsed, it can be hard to control exactly what the new process considers arguments.  You can try surrounding the argument with a quotes, but if the data has quotes and space, it can still turn one argument into multiple arguments.  You can scan the argument to try to look for bad characters (or better yet, looking for characters that are not good), but this adds to the difficulty.

    The ability for the parent to control exactly what the child sees as arguments makes life easier.

  17. Harry Johnston says:

    Rosyna: it should be noted that the Windows paradigm already treats the process path and the command string as separate things.

    By convention the first part of the command string passed to a new process is the name of the executable.  It is passed as it appeared on the command line, so it might or might not be fully qualified.

    However, if you’re using the Windows API to create a new process you can pass anything you like as the command string, regardless of the name or path of the executable.  Of course, it’s still just one string, not an array.

    The *nix paradigm has some advantages.  But there are also situations where the Windows paradigm has the advantage; for example, it means that an executable has the option of not giving quote marks a special meaning if it isn’t appropriate to do so, leading and trailing spaces can be significant, and so on.  Also I’m not sure that *nix supports Unicode on the command line?

    On the whole I suspect most programmer’s preference in this matter is based primarily on which model they were first exposed to. :-)

     Harry.

  18. Thank you Harry, that’s essentially what I was going to say (but you said it better).

    Ultimately, someone’s going to have to determine intent from a string, whether it’s the shell or the application.  In the Windows model, the app determines the parsing all the time.  In the *nix model, the app is at the mercy of the shell – different shells could very easily have different parsing algorithms, which means that depending on your choice of shell, your application might behave differently, and that’s never a good thing.

  19. Rosyna says:

    Harry,

    In the *nix model you can pass any string, any string at all for parameters. What the shell does by default is an implementation detail of the shell, not of the *nix model as a whole (a theoretical shell could require you to specify parameters in a way such that shell expansion never occurs). Additionally, if leading and trailing spaces should be significant, then the application can get them – as the parameters are passed as strings. Again, what the shell does by default is not the issue here, it’s what the model supports.

    Additionally unicode support depends on the shell (when launching from the shell) and the application. Most *nices now a days use UTF-8, so full unicode support is available.

    Larry,

    Your putting too much emphasis on the shell in general. It’s not about the shell at all, it’s about launching an application. In Windows the command line is presented as a single string of which meaning must be divined. In *nix the command line is presented as explicit, separate strings. There is no divining of intentions, because each parameter passed is exactly what they should be. And if you are getting your strings from the shell, then different parsing algorithms isn’t something that you should be concerned about anyway. All your application has to be concerned with is that it was passed N arbitrary strings – is it more work to make sure that the strings are sane? Perhaps. But it makes it easier to create more robust applications that are harder to attack.

  20. Harry Johnston says:

    Rosyna: in practice any realistic shell is going to have to separate parameters since otherwise most command-line tools won’t work.  This means that there is no way for an application to detect leading/trailing/intervening white space or the presence of quote marks when being launched from the shell (which, after all, is how most command-line applications are meant to be used).  This represents a genuine inconvenience to users, albeit only now and then.

    In contrast, the Windows model represents a genuine inconvenience to programmers – now and then. :-)

    I’m not trying to argue that the Windows model is definitively superior, by the way; on the contrary I don’t think either model is inherently better, just different.  (My ideal model would be quite different from either, probably closer to VMS than anything else.)

  21. Rosyna says:

    Harry, most applications in a user environment are not launched by the shell. They’re launched by double clicking them or automatically by some registered handler. There’s absolutely no reason for any extra parameters to be parsed by a shell.

    Quotes and spaces and carriage returns only have meaning when parsed by a shell. When they are passed to something like execl() there is no shell involved, and therefore there is no treating of those as special characters. Each argument/option is *explicitly* passed separately to the application being launched and there is no room for interpretation nor is there any chance the application will make one command into many or vice versa (as is the case with Windows).

    If I wanted shell expansion to occur, I’d explicitly use glob() and pass the result(s) of that to execv().

    "But there are also situations where the Windows paradigm has the advantage; for example, it means that an executable has the option of not giving quote marks a special meaning if it isn’t appropriate to do so, leading and trailing spaces can be significant, and so on."

    This is precisely the problem with the Windows paradigm. You may have a "trusted" process calling CreateProcess() with an arbitrary, user supplied string (such as the case with IE7) and passing it to a process that calls GetCommandLine() and does stuff with the string you can’t possibly predict, know about, or handle. This is why system() is well know to be extremely bad and a security risk. It causes things to be interpreted out of context which can lead to undesirable behaviours.

    The vulnerability comes from the fact that the string has to be parsed after the application has been launched. Take the following command line for example (from the recent security advisories but intentionally changed):

    dosomething.exe tothis"file

    From the Windows command line, the ” will be parsed out and dosomething will get a single parameter "tothisfile"

    From a typical *nix command line, the ” will be parsed out and dosomething will get a single parameter "tothisfile"

    For a typical Windows application launching dosomething, dosomething will see a single parameter "tothisfile".

    For a typical *nix application launching dosomething, dosomething will see a single parameter "tothis”file"

    That is the critical difference, and the core of the vulnerability. On Windows you click the link in your IRC client, and dosomething is passed "tothisfile". On *nix you click the link on your IRC client, and dosomething is passed "tothis”file". The former causes you to get bitten by a vulnerability. The latter does not.

    The problem is that under Windows you effectively always invoke the shell to parse the command line. Under *nix you only do so when the user is explicitly using the shell (unless the programmer is lazy or doesn’t know any better and uses system()). This allows your application to completely avoid any bugs that might be brought on by the shell, whereas in the Windows model you have to either trust the shell or roll your own.

    Larry, since you just did a 13 part series on threat modeling the PlaySound API, how would you threat model the combination of CreateProcess() and GetCommandLine/CommandLineToArgW?

  22. Rosyna, There’s no threat modeling to be done here.

    The command line arguments to any program are untrusted and need to be parsed as if they were authored by a hostile entity.  That’s true, regardless of the command line processing paradigm.

    From a security standpoint, there is absolutely no difference between the two paradigms.

    Any application that trusts its command line has a potential security issue, especially on operating systems like *nix where the setuid bit can cause those applications to run with enhanced privileges without user interaction. Historicaly there have been a number of elevation of privilege vulnerabilities associated with setuid root applications incorrectly parsing their command line arguments.  For instance, 20 seconds with my search engine found this one: http://www.matasano.com/log/861/this-old-vulnerability-sendmail-869/ (yeah, I know it’s from 1985, it doesn’t matter).

    Ultimately, this is a religious issue, and thus there is no "right" answer.  IMHO, there is no clear "best" mechanism.

    The Windows paradigm allows for consistant parsing of command line parameters, regardless of how the application is launched.  The *nix paradigm allows the caller to specify specific arguments to a program without having to worry about the parsing algorithm used by that called program.

    They’re just different, one is neither better or worse than the other.

  23. Harry Johnston says:

    Rosyna wrote: "Harry, most applications in a user environment are not launched by the shell. They’re launched by double clicking them or automatically by some registered handler."

    I was discussing command-line tools, not GUI applications.  Granted in the context of a GUI (darned new-fangled contraptions that they are) the Windows model is potentially less convenient for the programmers, in almost all cases it doesn’t matter, because Windows filenames aren’t allowed to contain quote marks or other "difficult" characters anyway.

    Larry wrote: "The command line arguments to any program are untrusted and need to be parsed as if they were authored by a hostile entity."

    Does this include cmd.exe? :-)

    Again, I guess this boils down to the distinction between command-line apps and registered-handler apps.  Trouble is Firefox is both; granted it’s used in the latter way more often, I for one depend on the former as well.  Looks like the only good solution is going to be to have two executables.  This *is* one case where the Windows model causes a non-trivial inconvenience … but they’re rare.

  24. Harry: Of course.  Having said that, as far as security’s concerned, a command line parameter validation error that causes a crash is just that – a crash, and not a security hole.

    On the other hand, if the command being launched is running in an elevated context (setuid root, for example) it’s an EoP attack.

  25. Norman Diamond says:

    Tuesday, October 09, 2007 4:32 PM by Harry Johnston

    > Windows filenames aren’t allowed to contain quote marks or

    > other "difficult" characters anyway.

    Oops.  Keep believing and spreading that, and you’ll produce new security problems.  Windows filenames aren’t allowed to contain double-quote marks and SOME other "difficult" characters, but they are allowed to contain SOME other "difficult" characters.

    As far as I can tell, malicious software can create a folder whose contents can’t even be viewed by Win32 applications, and can proceed to execute further malicious software that it drops in that folder.  As far as I can tell, one such folder was created by some Windows security patch (yielding the exact opposite of what a security patch is supposed to do) but it contained no files.  If such a folder were created by malicious software, it more likely would contain files.

    > Larry wrote: "The command line arguments to any program

    > are untrusted and need to be parsed as if they were authored

    > by a hostile entity."

    >

    > Does this include cmd.exe? :-)

    When cmd.exe is being executed by your friendly neighbourhood code red installation?

    Tuesday, October 09, 2007 4:51 PM by LarryOsterman

    > […] as far as security’s concerned, a command line parameter

    > validation error that causes a crash is just that – a crash, and

    > not a security hole.

    In this case, does "crash" mean "fail to even start executing the intended target"?  If so then I’m inclined to agree that it doesn’t yield a security hole.  But if "crash" means causing the host process to terminate (e.g. IIS) or other worse kinds of crashes, I think a lot of people consider those to be security holes.

  26. Norman Diamond says:

    Oh neat.  Not only can Win32 create folders that Win32 has trouble accessing (depending on how Win32 tries to go about accessing the folders that it created), and not only can Windows Services for Unix create folders that Win32 can’t access, but I’ve just seen Windows Services for Unix have trouble accessing files that Windows Services for Unix created.

    If keyboard handling had to be destroyed in order to improve security, I think I’d have some amount of grudging understanding.  But when keyboard handling is destroyed solely for the purpose of destroying keyboard handling, and security looks like it’s going to get worse instead of better, Microsoft still gives a big impression of not "getting it".

  27. R Samuel Klatchko says:

    > Ultimately, someone’s going to have to determine intent from a string, whether it’s the shell or the application.

    Larry, that isn’t true.  One thing you keep forgetting is that under *nix, you don’t have to use the shell.  When you are writing system code, you can directly invoke an exec system call and thus the intent can be determined by the engineer.

    So, if a program includes the code:

     execl("/path/to/exe", "arg1", "arg2", get_arg3(), NULL);

    the intent of each argument is built into the logic of the code and no string processing needs to happen.

  28. Igor says:

    And what if malicious program exploits the fact that "no string processing needs to happen" and does this:

    execl("/path/to/exe", "arg1 some garbage which leads to crash and", "arg2 arbitrary code execution", get_arg3(), NULL);

  29. R Samuel Klatchko says:

    > And what if malicious program exploits the fact that "no string processing needs to happen" and does this:

    There are a couple of things that don’t make sense here.

    1) First, I am not sure how this exploits the fact that no string processing is needed.  This should be doable with the Windows command line as well:

     CreateProcess("/path/to/exe "arg1 some garbage which leads to crash and" "arg2 arbitrary code execution"");

    Unless the Windows command line processing makes it impossible to send certain arguments, anything you can send with a non-parsed line can be sent with a parsed-line.

    2) I think we are talking about two different things.  I am referring to the case where a developer of a system needs to write code that spawns off a child with various arguments.  The developer needs to allow some of those arguments to come from user data but needs to make sure the child sees those correctly (so the email address field is only seen as a single argument).

    But you are talking about a case where the attacker can write a random program and put it on the system.  While I do not disagree that this is a problem, as I outlined in my first point, I do not see how parsing the command line solves this problem.

  30. Norman Diamond says:

    I have to retract one of my statements, in a response to Larry Osterman:

    <<  […] as far as security’s concerned, a command line parameter validation error that causes a crash is just that – a crash, and not a security hole. >>

    < In this case, does "crash" mean "fail to even start executing the intended target"?  If so then I’m inclined to agree that it doesn’t yield a security hole.  But if "crash" means causing the host process to terminate (e.g. IIS) or other worse kinds of crashes, I think a lot of people consider those to be security holes. >

    Depending on what was supposed to execute, "fail to even start executing the intended target" can be a security hole.  When a firewall was supposed to start executing but didn’t, all ports were wide open.