Why does OpenProcess succeed even when I add three to the process ID?


A customer noticed that if you add three to a process ID and pass it to the OpenProcess function, it still succeeds. Why is that?

Well, first of all, I need to say up front that the behavior you’re seeing is an artifact of the implementation and is not part of the contract. You’re passing intentionally invalid parameters, what did you expect? The context of this question is “We’re seeing this behavior and we can’t explain it,” not “We’re using this trick and want confirmation that it’s okay.”

Now, you actually know the answer to this already.

As we saw earlier, for convenience, the Windows NT kernel uses the handle manager to parcel out process and thread IDs, and the handle manager ignores the bottom two bits of handles. Therefore, adding three has no effect on the process-id-to-object mapping.

This mechanism is peculiar to kernels based on Windows NT. Versions of Windows derived from the Windows 95 kernel have a different mechanism for mapping process IDs to processes, and that mechanism is unflinchingly rigid. If you add three, the OpenProcess function will reject your process ID as invalid. And I don’t know how Windows CE handles it.

Again, I wish to emphasize that the behavior you see in Windows NT-based kernels is just an implementation artifact which can change at any time. Who knows, maybe once they read this entry, the kernel folks will go in and change OpenProcess to be even more strict.

Pre-emptive Yuhong Bao comment: “Process IDs on Windows 95 are a pointer to an internal data structure XORed with a constant to obfuscate them.”

Comments (42)
  1. Nathan_works says:

    I do wonder about such customers.

    Is this what happens to the kids who would normally try to drink drain cleaner as a kid, but manage to survive to adulthood ? Then they try equally dumb things like "gee, lets add 3 to some returned handle-like value and see what happens ?"

    I mean, Steve Irwin made a career out of it with poking testy animals..

  2. Reena Agrawal says:

    I had never noticed that process ids were multiples of 4, but seriously why would anyone even bother about trying something like that.

  3. Pierre B. says:

    Reena:

    It’s obvious: they had a program that launched 3 other processes and needed the process ID of the third one: take your own process ID and add 3. Voila! :-)

    (Don’t laugh. Some people think like this.)

  4. Mike Dimmick says:

    As I recall, Windows CE adds a randomization factor to any handle it gives out so it has a greater chance of success in detecting handles that outlived the object.

    In this case I would imagine adding three would point to nothing at all.

  5. Aaargh! says:

    The guys who implemented that bit of code should be smacked on the head, repeatedly.

    This is a clear violation of Curly’s Law: A variable should mean one thing, and one thing only.

    How the h*ll dit code like this end up in something as critical as the kernel ?

    [The guys who implemented the handle table were following the spec: The bottom two bits of handles are ignored. -Raymond]
  6. Aaargh! says:

    "The guys who implemented the handle table were following the spec: The bottom two bits of handles are ignored"

    Then it’s a bad spec, and the guys who wrote the spec should receive the asswhooping.

  7. Greg D says:

    I sense a pre-emptive Aaargh! comment in the future…

  8. f0dder says:

    Aaargh: or perhaps developers should follow the code contract, and don’t make silly assumptions. Sometimes you do stuff, especially in kernels, for performance reasons.

    Don’t tell me that the "reserved for OS use" bits in the pagetable structures shouldn’t be used by the OS, because it might break a lame driver that does manual modification of the pagetable structures :)

  9. Aaargh! says:

    "Aaargh: or perhaps developers should follow the code contract, and don’t make silly assumptions."

    Of course you need to follow the code contract, but that doesn’t excuse the stupid design.

    " Sometimes you do stuff, especially in kernels, for performance reasons."

    Sometimes you do stuff like that, indeed. And almost every time you do it, it comes back to bite you in the arse eventually.

    It’s far more important to have a clean, future-proof design than to try and squeeze out every last bit of performance only to find yourself having spent 6 years on the release of your next OS version with nothing to show for it but a but a slightly updated GUI, just because you had to have all those performance- and compatibility-enhancing hacks in decades before.

    It’s just a bad case of short-term thinking. It’s a nasty habit humans have to only think of the short-term and never about long-term consequences.

  10. Alexandre Grigoriev says:

    This is a clear violation of Curly’s Law: A variable should mean one thing, and one thing only.

    A handle or object ID, by definition, is opaque. It doesn’t mean "one thing". It doesn’t mean a thing, actually.

    its 2 least significant bits, which are ignored and can be messed. Nobody should do any operation with those 30 significant bits, because there is no arithmetic defined for them. Other than comparison to NULL.

  11. Aaargh! says:

    "Don’t tell me that the "reserved for OS use" bits in the pagetable structures shouldn’t be used by the OS, because it might break a lame driver that does manual modification of the pagetable structures :)"

    No, but you might ask yourself why that lame driver has access to those bits in the first place. Restrict access, encapsulate your data.

  12. Gordon says:

    Why is it a bad spec? Because you say so?

  13. Frymaster says:

    "No, but you might ask yourself why that lame driver has access to those bits in the first place. Restrict access, encapsulate your data."

    Then you get people moaning that you’re writing a nanny OS that doesn’t let Real Programmers get the control they need

  14. Ulric says:

    Poor Yuhong Bao, just trying to inform us a bit further on the subject…

    Aaargh! shouldn’t have gone postal on that post about the "accelerators for hidden controls still active".  THAT was a terrible hack.  Had I know about it… I probably wouldn’t have dared posting..

    :)

  15. Aaargh! says:

    "Why is it a bad spec? Because you say so?"

    No, because it stores 2 kinds of information in one variable. It makes things less transparent and thus it increases the risk of bugs. Changing the information stored one part may affect data the functionality of the other.

    Suppose someone forget to mask the 2 lower bits before comparing a handle passed as a parameter to some kind of internal table. Most of the time this would be ok, because most of the time the 2 lower bits are left alone. But if some information was stored there, the compare would not find a match. Actually, with a codebase as big as Windows, I’ll bet you there is a bug like this somewhere in the OS.

    It’s an unnecessary risk, and it’s actually easier to use it wrongly (as this and previous posts about this subject prove) than to use it right.

  16. Aaargh! says:

    "Then you get people moaning that you’re writing a nanny OS that doesn’t let Real Programmers get the control they need"

    Those Real Programmers(tm) are the same people who complain their hackish app doesn’t work on OS version n+1. They won’t be missed.

  17. Chris says:

    On a rival operating system, we got tired of the compatibility issues and started carefully validating inputs and returning parameter errors (in new APIs) if we received bit values which would be otherwise ignored.

  18. Gordon says:

    @Aaargh: Why isn’t the caller masking out the two bits? If the hypothetical function is expecting a handle, the low bits should be 0 when the handle is passed in.

  19. Aaargh! says:

    "@Aaargh: Why isn’t the caller masking out the two bits"

    Why should it ? As someone stated before, the handle is opaque, it’s an internal data format for use by the kernel as the kernel sees fit. You’re just passing the data you got from some other kernel function.

  20. Leo Davidson says:

    "I do wonder about such customers."

    My guess is they were not doing this on purpose. Perhaps they had a table of handles and were accidentally incrementing the handles themselves instead of pointers to them when working on the table.

    Something like that may have resulted in a lot of head-scratching because, strangely enough, some of the handles in the table still worked but others didn’t. Once the bug was found the customer may have worked out that it was only the handles that didn’t have 4 or more added to them which worked, and then asked Microsoft why that was in order to properly understand the issue (and ensure there wasn’t another problem lurking in their code).

    Just a guess, of course. I’m sure there are lots of ways this problem could come from a valid mistake instead of something purposely doing stupid things by adding to handles.

  21. Ulric says:

    @Aaaargh, there is something you’re not getting.  Maybe that’s why you’re going on this:

    Suppose someone forget to mask the 2 lower bits

    before comparing

    NT ignores the two lower bits.  It does NOT mean that YOU have to mask these lower bits.  You can’t "forget" to mask them, you’re not supossed to be masking them at all.

    It’s a handle.  When you get it, the two lower bits are always zero.

    The lower bits not having meaning when they are used internally by NT in an implementation detail.

  22. Aaargh! says:

    "@Aaaargh, there is something you’re not getting.  Maybe that’s why you’re going on this:

    (…)

    NT ignores the two lower bits.  It does NOT mean that YOU have to mask these lower bits. "

    I don’t think it’s me not getting it (see my post 2 posts above yours). NT is not some magical piece of code that just magically appeared out of thin air, it was written by someone. It’s not immune to bugs, which is exactly my point.

    NT is indeed SUPPOSED to ignore the lower two bits but SUPPOSE someone (@ microsoft, working on NT) forgets to do so, it won’t immediately show up. It might even ship with that bug (and it probably has in some place) which is exactly why this is a bad practice.

  23. Wolf Logan says:

    Wow, I think we’ve passed some kind of threshold here…Microsoft is now getting dinged for bugs that theoretically could exist, but have never been observed, and for which there’s no evidence of their likelihood.

    Maybe Raymond’s "preemptive snarky comments" have spawned "preemptive bug reports". You know, just in case it ever actually happens, someone will be able to say "See? I was right!"

  24. akex.r. says:

    Aaargh:

    At some point, when you look down the layers of your system, you don’t have variables, just a big pool of memory.

    It’s difficult to say memory shouldn’t be used for two things. But you could argue that a certain portion of memory should be used for a single thing throughout the life of a system… (at the extreme, basically preventing dynamic memory allocation.)

    Nevertheless, this use of the handle would still be valid in this respect, since the first 2 bits are always used for the same thing, as are the next 30.

    The limitation that the high level language most developers will use to access this memory can’t associate two different names with these two distinct portion of memory could be an issue when designing a language-agnostic OS… but it’s not clear to me that it should always be, especially when you have no idea what the high-level language will be down the road.

  25. nksingh says:

    Aargh:

    The bug you are afraid of (someone forgetting the mask the two low bits) is unlikely for kernel handles since all accesses to the handles quite sensibly go through a common set of routines that are pretty heavily tested.  

    In some rival OSes, such as the ones Chris may use, different types of kernel objects exist in different namespaces and different handle systems, so such bugs may be likely due to duplicated code to deal with opaque descriptors.  NT has a centralized approach to handling HANDLEs (and pids), so I think you should not be so concerned about this possiblity.

  26. Victor says:

    Aaargh:

    You’re an idiot, quit posting. Your form of argument is repeating your original statement over and over. Why don’t you go join O’Reilly on Fox? It would be an improvement.

  27. Aaargh! says:

    > "A handle or object ID, by definition, is _opaque_. It doesn’t mean "one thing". It doesn’t mean a thing, actually"

    Of course it means something, it’s at least a key in a table somewhere, or whatever.

    Maybe it’s opaque to the *user* of the handle, but inside the kernel, it has to have some meaning. If it didn’t mean anything, why have a handle at all ?

  28. Victor, it wouldn’t be an improvement. Because then there’d be two of them. Like attack of the pod idiots or something. :p

  29. Dean Harding says:

    Let’s all just stop writing code right now, because for every line of code we write, there’s a non-zero chance that a bug might be introduced.

  30. Reena Agrawal says:

    It’s obvious: they had a program that launched 3 other processes and needed the process ID of the third one: take your own process ID and add 3. Voila! :-)

    (Don’t laugh. Some people think like this.)

    :-D

    Wow!! Such a out of the box solution.. :D

  31. Igor Levicki says:

    What Aaargh! is trying to say is that if you have a door handle, you expect it to open or close a door, not to have some bits slapped on it that are used for other tasks.

    @Aaargh!:

    Give it up man, some programmers just cannot see the flaw in their own code or reasoning, they are blind to their own errors by design.

    @Yuhong Bao:

    Congratulations, now you are the one being pre-empted.

  32. Yuhong Bao says:

    "Let’s all just stop writing code right now, because for every line of code we write, there’s a non-zero chance that a bug might be introduced."

    As we get closer to a release of software, we get closer to that point…

  33. tragedy of the commons says:

    Sigh, I fear the armchair kernel developers will have killed yet another topic for Raymond.  

    I wish he’d post all the would-have-been future stuff he wrote and killed off in some place that doesn’t allow comments.

    I happened to be reading through an older entry the other day, and it’s sad how jaded Raymond has become.  The contrast in tone is quite stark.

    All right, I can’t hold it in any longer: anyone who is complaining about the last two bits holding extra information should just shut up and stop spewing their ignorant, stupid complaints about internal use of arbitrary parts of an arbitrary number.  Clearly none of you has ever read (or just plain has not understood) a single historical article Raymond has written, because every time, some clown says, "why was this done in such a stupid way" and every time the answer is "because it had to run on a toaster with a 2 MHz processor and 5 bytes of RAM."  Bonus points if the person also complains about how bloated software is nowadays.

    Ok, so I now understand what it’s like to be in Raymond’s position, constantly inundated with ignorant comments insulting his work.  It looks like I’m jaded too now.

  34. Speck Tator says:

    Arrgghhh:  You’re an idiot.  You’re whining about an implementation detail, which is only visible when YOU (an app developer) do the wrong thing.  And then you’re making up examples of completely idiotic app behavior — and then blaming the results on NT.  That’s just idiotic.

    Igor Levicki: Congratulations, now someone looks stupider than you.  Yay, that’s something for you to be happy about.

    The fact that NT chose to use 30-bit handles (by setting the bottom 2 bits to zero when returning them, and ignoring them on usage) is an implementation detail, and has no relevance to app developers.  If an app developer relies on specific HANDLE values, then they are a f*ing idiot, and they deserve whatever brain-damage happens.

  35. ender says:

    Speaking of PIDs, weren’t they multiples of 2 on NT4 – so probaby OpenProcess wouldn’t work if you increased PID by 3 before passing it to OpenProcess there?

  36. Michael J says:

    Good grief

    I love it when people rant about good practice without any apparent knowledge on the topic at hand.  Consider some basic principles:

    1.  To the external user a handle is a handle, no more and no less.  The API gives it to you and you use it verbatim.  If you mess with it, then maybe it works and maybe it doesn’t.  Nobody cares.  You’ve broken the contract.
    2.  On the inside (API implementation), a handle is whatever you want it to be.  An int, a pointer, a struct, whatever.  As long as you use it consistently and always behave as advertised, you are in the clear.

    3.  Much of this code is descended from antique versions that ran on 80286s (or worse) in 640K of RAM (or less).  In that environment, you used every bit as efficiently as possible.  Now that version 93 is running on modern chips with multi-GBs of RAM, doesn’t mean that you want to break working code for some esoteric reasons.  None of us (except possibly a few MS folks) know what is the reason behind the design of the relevant handle, nor (obviously) whether that reason is still important.

    4.  If you want to comment, why not try to ad to the conversation, rather than just criticise others.  (Hmmm.  Kind of like I’ve been doing).

    OK, a positive thought: do our resident kernel experts know whether PIDs get re-used quickly, after a process dies?  If so, does that impact on how one programs when using PIDs?

    (Yes, I know the answer.  If nobody posts it, I’ll write something later.)

    Raymond, thanks for sharing your knowledge.  It is generally quite interesting, and this is no exception.

    Michael J.

  37. KenW says:

    @Aaargh!: "I don’t think it’s me not getting it"

    Yes, it is.

    The point you seem to be missing here is *adding 3 to a handle*. If a handle is an opaque item, why would you expect doing *any* mathematical operation on it to have meaningful results? It’s not an INT, or a WORD – it’s a HANDLE.

    @Igor Levicki: Great. You’re not only a troll, but now a liar. You promised us you weren’t going to post here any more. And congratulating  Yuhong Bao for being classified alongside you is an insult, not a compliment.

  38. Richard says:

    While the reactions of some may have been a little, shall we say, irrational, there is a potential issue here:

    A 32-bit value where the bottom two bits are not zeroed out is an invalid HANDLE. And yet kernel-mode functions accept such values. Any time a kernel-mode function accepts, and operates on, invalid data, there is the potential for a security issue. All it would take would be one kernel routine incorrectly dealing with such handles and not getting away with it, and one sufficiently cunning and unscrupulous programmer, and you have a privilege escalation.

    It’s very likely that all the relevant kernel routines are carefully written, and mask out the bottom two bits for you. There are at least three ways in which an “invalid argument” failure would have been better, or at least no worse:

    1) Incorrect application code now works by chance. MS lose; the bottom 2 bits have to remain unused forever, or important applications will break.

    2) Performing the check and masking off these bits are the same level of complexity, so the implementation cost of getting this right compared to the current solution is approximately zero.

    3) If such a HANDLE was required to cause an “invalid argument” error to be returned, that requirement would be easily testable. The current situation is “yeah, we’re pretty confident this will be OK, but we can’t easily prove it with a unit test”.

    Back when these APIs were designed, no-one really “got” security the way we hopefully are all starting to now. It’s understandable that the APIs were designed the way they were. It’s understandable that, in order to avoid breaking existing applications, they need to stay the way they are. But, if these APIs were being designed today, I am very confident that all invalid handles would result in an error return, rather than the current behaviour. Does that mean these APIs are not well-designed? I think that’s a judgement call.

    [Thanks everybody for not reading the linked articles. Especially the one that explains that the bottom bit is meaningful but in a different way. -Raymond]
  39. Speck Tator says:

    And yet kernel-mode functions accept such values. Any time a kernel-mode function accepts, and operates on, invalid data, there is the potential for a security issue.

    Uhhh, bullshit.  You think you’ve found something, but you’re just inventing something out of nothing.

    The bottom two bits are ignored.  There are NO security implications to this.  All handle-to-object translations are handled by the same module, which is what handles shifting the handle value down by two bits.

    Back when these APIs were designed, no-one really "got" security the way we hopefully are all starting to now.

    Again, bullshit.  The attack scenarios were mostly based on user-to-user attacks, not data-to-user attacks, but it’s bullshit to assert that no one "got" security.  And handle table look-ups are an obvious place where security matters.

    NT-based OSes have been on the market for about 15 years now.  The fact that NT handles ignore the bottom 2 bits has never been "exploited", because it isn’t a freakin’ bug.

    Get over it, people.  You WISH there were a problem here, but wishing don’t make it so.

  40. Edgar says:

    My guess:

    The 2 bits are never used in the Kernel, but in the System-Layers above.

    There it is used to store state information for an object.

    Each process can therefore store 2 bits of information in all handles(objects).

    You avoid reading the object behind the handle.

    Good design ?

    Performance rules sometimes more.

    As I said before: Only a guess.

  41. Richard says:

    @Speck Tator

    You seem to be misinterpreting what I said.

    > > And yet kernel-mode functions accept such values. Any time a kernel-mode function accepts, and operates on, invalid data, there is the potential for a security issue.

    > Uhhh, bullshit.  You think you’ve found something, but you’re just inventing something out of nothing.

    I think no such thing. Do you genuinely not understand the difference between a *potential* security issue and an *actual* security issue, or are you deliberately making a strawman of my argument? Do you really think that kernel calls shouldn’t always validate their arguments?

    Now Raymond makes a good point here — despite saying "these are not valid kernel-mode HANDLEs", one third of them actually are. But the argument appears to apply unchanged to HANDLEs with bit 1 set.

    > The bottom two bits are ignored.  There are NO security implications to this.

    That’s quite an assertion you’re making there. Can you prove that? Or are you just posturing?

    Suppose some part of the kernel were to assume that two handles were equivalent iff they were equal. I could easily imagine situations where that could enable an attacker to do something e wasn’t supposed to be able to do. Again, I’ll remind you that this is a *potential*, *hypothetical* security issue. However, anyone who "gets" security knows that, unlike hypothetical bugs, hypothetical security issues *are* very important. If you don’t check them, attackers certainly will, eventually.

    And I’m sure someone will come up with some clever reason why that particular type of problem will "never" cause problems in practice (ho, ho). But it’s just one example of possible problems. When you’re in the security game, the burden of proof is on you to show your code has mitigations for the relevant potential issues, not on others to show it doesn’t.

    > > Back when these APIs were designed, no-one really "got" security the way we hopefully are all starting to now.

    > Again, bullshit.  The attack scenarios were mostly based on user-to-user attacks, not data-to-user attacks, but it’s bullshit to assert that no one "got" security.

    That’s why I didn’t assert that. Strawman again. I said (more or less) that no-one "got" security *to the extent we do now*. No-one did STRIDE back then, for instance. Security analysis was a much more casual, informal process. It is the nature of our industry that we learn more as time progresses.

    > NT-based OSes have been on the market for about 15 years now.  The fact that NT handles ignore the bottom 2 bits has never been "exploited", because it isn’t a freakin’ bug.

    Of course it’s not. The fact that the bottom 2 bits are ignored isn’t a problem. The problem is making sure that everything that should ignore them, actually does ignore them. And my point (which I think you’ve indirectly agreed with by saying "handle table look-ups are an obvious place where security matter") is that failing to ignore them could have security implications, where a different design would be immune to that class of potential issues.

Comments are closed.