How can I make sure my program is launched only from my helper program and no other parent?


Say you have a collection of programs which work together. One of them is the “master” program that runs the show, and it has a bunch of “assistant” programs that it launches to accomplish various subtasks. These assistants are not meant to be run by themselves; they are meant to be run only by the master program. How do you design the assistant so that it can only be run by the master?

There’s nothing you can do to force the assistant to be run only by the master, since anything you do to detect the case can be faked out by an attacker. (Worst case is that they just run your program under the debugger and patch out the code that looks for the master.) So the purpose of this test is not so much to create an airtight hatchway as it is to prevent users from randomly wandering into the Program Files directory and double-clicking stuff to see what happens.

The simplest way of doing this is to require a command-line parameter that the master passes to say, “Hey, it’s me, the master. It’s okay to do that thing you do.” The command line parameter could be anything. assistant.exe /run say. If the command line parameter is not present, then the assistant says, “Um, please don’t run this program directly. Use the master.”

You might decide to get really fancy and make the secret handshake super-complicated, but remember that there is no added security benefit here. The user can compromise assistant.exe by simply attaching a debugger to it, at which point any defensive mechanism you create can simply be disabled by a sufficiently-resourceful attacker. (And there’s a class of people who will see that you put a lot of work into protecting your assistant, and that will just convince them to work harder to circumvent the protection. Because something with this much protection must certainly be very valuable!)

There’s also a benefit to keeping the secret handshake simple: It makes it a lot easier for you to debug the assistant program. Instead of having to set up the master and then get the master to do all the things it needs to generate the secret handshake for the assistant, you can just run your assistant directly with the magic flag, and boom, you’re off and debugging.

To make it even harder to run your program by accident, you can give it an extension that is not normally executable, like .MOD. That way, it cannot be double-clicked, but you can still pass it to Create­Process or (with some cajoling) Shell­Execute­Ex.

Comments (40)
  1. parkrrrr says:

    "And there's a class of people who will see that you put a lot of work into protecting your assistant, and that will just convince them to work harder to circumvent the protection. Because something with this much protection must certainly be very valuable!"

    Alternatively, something with that many anti-debugging "features" must surely be doing something nefarious. Which is why, for example, I refuse to use a certain product even after Microsoft acquired it.

  2. Joshua says:

    An entertaining way is to check if the parent process is the master by looking at the image name of the parent process.

    I don't use real anti-debugger these days so I can attach a debugger in production. I don't seem to mind debugging helper programs by attach to process. Otherwise they have nothing to do anyway.

  3. Karellen says:

    Why do the assistants have to run in separate processes?

    Couldn't one just put the assistants in DLLs, either link them directly or LoadLibrary()/GetProcAddress() at runtime, and then run the assistants with CreateThread()?

    (For debugging, "master /debug-assistant AssistantFoo")

  4. Lars Viklund says:

    Karellen: There are many benefits to the overhead of multiple processes.

    On 32-bit platforms, you get more address space in total as it's per-process.

    Crashing or relatively untrusted code only ruin things for itself.

    You cut down on things like heap fragmentation and resource exhaustion in your long-running master as a separate process has perfect cleanup of its resources.

    You can elevate child processes and not have to care about your master running in an escalated world.

  5. Joshua says:

    @Karellen:

    1: Loading a 32 bit DLL to call from a 64 bit process.

    2: Containing a component that is not trusted to not corrupt process's memory.

  6. Dan Bugglin says:

    @Karellen That is one approach, but there are merits to using separate processes.  For example, Google around and you can find lists of reasons why Chrome does it this one.  (Off the top of my head: if the assistant program crashes, the entire program doesn't crash.  The assistant program can be given less security permissions and thus if it is hacked the consequences are limited.  It is easier to measure individual memory and CPU consumption by the user for different aspects of your program (although shared memory blocks, which you'd want to use to avoid extra memory consumption as much as possible, muddle the waters a bit).

  7. Joker_vD says:

    Also, sometimes you just don't want to spend the effort to turn a standalone program into a DLL plugin. For example, OpenSSH — it's one thing if you ship your product together with ssh.exe (and fifty megabytes of Cygwin libraries), and another if you take the sources and try to compile-and-link them into your solution. Given how capricious Cygwin is, that's not really worth the effort. Just read to/write from the pipe.

  8. Random832 says:

    @Joker_vD But if you're shipping a standalone program that is designed to be run on its own, why do you care if the user runs it?

  9. Random832 says:

    Often because the login credentials are located in the parent process.

  10. GregM says:

    "Why do the assistants have to run in separate processes?"

    I do it for 4 main reasons, some mentioned before:

    …more address space…

    …Crashing or relatively untrusted code…

    …resource exhaustion…

    The thing that I put in a separate process is a third party component that is known to be crash prone, memory leak prone, loads hundreds of DLLs, and is not thread-safe.  So we get safe parallelism using multi-process instead of multithreading.  Since the component is converting a file from one format to another, it's perfectly suited for this.  Yes, there is some I/O contention, but it's still a lot faster, especially on some of the machines we run on (dual quad core

    hyperthreaded, for 16 parallel processes).

  11. Yuri says:

    @Karellen

    Where I work, we do it because our main legacy product is written in a language using a framework that isn't easily interoperable with newer technologies.

  12. jcs says:

    That trick is as old as MS-DOS itself; having been commonly used in DOS games in the 80s and 90s. I assume it's because different parts of the game were written in different languages, or by different development teams.

  13. Nicholas says:

    @Karellen

    Sometimes 32-bit is all you need.  Going 64-bit is not a clear cut decision.  The following article, by the awesome Rico Mariani, explains the pros and cons:

    blogs.msdn.com/…/visual-studio-why-is-there-no-64-bit-version.aspx

    Developing 32-bit applications in a 64-bit environment seems to be a good balance.  Also, going 64-bit puts you in a different universe where the development tactics that you know may not apply very well.  Are you suggesting that it is worth going to 64-bit just so you can allocate a 7 GB array and call qsort on it?  When you are dealing with big data your tactics and techniques change drastically, so bit-ness may not be the big factor here.  Sure, if you are SQL Server then of course you are going to be 64-bit.  For the average program, 32-bit is sufficient.

  14. RangerFish says:

    incompatibility between languages and architectures is definitely one reason to have separate processes. Other considerations may relate to how processes are allocated time on the CPU or how they can reserve memory. Even how many threads a process can reasonably have might be an issue.

    Another reason I've seen is simple compartmentalisation. We have a server process which consists of a manager service which invokes several child processes. Each child process has a relatively simple, well-defined job, and having them separated means it's much easier to distribute the components across different machines or set up configurations where only some of the components are installed. It also means that those components can be developed independently (or at least, more independently) of each other.

    In response to the general concept, we also do something similar where an installer must be run by its bootstrapper, though in that case, rather than check to see if the parent process passed a flag, we check to see if certain properties of the environment match what we expect (in particular, is the process elevated?).

  15. Graham says:

    The official recommendation from Microsoft is that if you need to make MAPI calls as another user, you shouldn't impersonate the current user in a different thread, but kick off a separate process as that user. I think Raymond has also mentioned that running any kind of COM under impersonation is a bit risky, since you're relying on everything knowing how to handle the impersonation. If you run things as a separate process under a different user id, all those issues go away.

  16. Crescens2k says:

    @Nicholas:

    The question is, sufficient for how long.

    I am not the kind of person to agree with the whole idea of sometimes 32 bits is enough because that can cause the wrong kind of mind set. "Why progress when what we have is enough".

    Sure, stability is good, and changing for the sake of it is bad, but that is no reason to stick to the lowest common denominator if you can take advantage of more.

    This can help future proof if the dataset of your application isn't defined, also it gives you access to things on the processor that you wouldn't have in 32 bit mode.

    So I would say even the thing of "sometimes 32 bit is enough" itself is not clear cut. Take video players and web browsers, there was a time that it was felt that 32 bits was enough for that, some people think that. But I know people who like to keep as many tabs open as possible and have had a 32 bit browser run out of address space. Not to mention browsers themselves are getting a lot more complex. Video playback is going through that kind of thing too, 1080p encodes can be rather large, but the next step is towards 2k and 4k.

    So you can never tell if a "this is enough" will suddenly one day be not enough.

  17. Fleet Command says:

    My assistant programs do not perform safety checks because without a master to send them the data that they need to do what they do, they exit gracefully.

  18. Jon says:

    @Crescens2k

    There is a real cost in terms of performance and memory to go 64 bit. This is why there's now an x32 ABI for Linux and why things like ARM THUMB exist.

  19. Karellen says:

    Ah yes, running your assistant in a sandbox. That does make sense – thanks everyone who suggested it.

    Everyone stuck with 32-bit code and the problems it comes with, my deepest sympathies. To think, in this day and age, that some people still argue *against* using 64-bit code whereever possible!

    As for isolating (poorly written?) 3rd party components, I got the impression that the article was discussing the case where the "master" and "assistants" were not just developed by the same team, but were all part of a single development effort. Which part of the article are people reading that encompasses 3rd party assistants?

  20. GregM says:

    "As for isolating (poorly written?) 3rd party components, I got the impression that the article was discussing the case where the "master" and "assistants" were not just developed by the same team, but were all part of a single development effort. Which part of the article are people reading that encompasses 3rd party assistants?"

    My master and assistant are developed by the same team and are part of a single development effort.  The assistant uses the third party component that has those problems.  The third party component isn't a stand-alone exe.

  21. GregM says:

    The running 32 bit code from a 64 bit process also pops up here, as I have occasion to call an older version of the assistant that was only developed as 32 bit (predated our move to 64 bit).

  22. J says:

    @jcs:

    > That trick is as old as MS-DOS itself; having been commonly used in DOS games in the 80s and 90s. I assume it's because different parts of the game were written in different languages, or by different development teams.

    In many cases, it's actually because machines at the time didn't have the resources to handle the entire game at once.

    As an example, the original X-Com had two different executables – one for the tactical missions, and one for the strategic overworld. When you started a mission, the overworld executable would write the state of the game to a file, start the tactical executable, and then exit, leaving all the resources of the machine available to the tactical executable. When you finished the mission, the tactical executable would write the results of the mission into the save file and then restart the overworld.

    Think of it as a high-level version of paging code in and out of memory as required.

  23. Sam says:

    A related use mode: the "assistant" process may not even run on the same computer. This is how one of our scientific computing programs works, there's a 32 bit GUI front end for viewing the data, and a solver executable that does the real computing. It's split so that the solver can run on clusters, can be a 32 or 64 bit version, run on other operating systems, and requires no GUI and a minimum amount of shared libraries (which is important on clusters).

  24. Azarien says:

    @Karellen: I still think the proper way is "compile as 32 bits, and don't bother with 64 bits unless really needed".

  25. Crescens2k says:

    @Jon:

    The converse is also true.

    For a 32 bit process to run on a 64 bit system, it has to go through an emulation layer, this is not free.

    For Windows this means for system calls, it goes through the 32 bit ntdll, this repackages it to 64 and then calls into the 64 bit ntdll to do the actual system call. Don't forget that on 64 bit systems, 32 bit processes also have to have two stacks and load more libraries. This also changes the semantics of some things too, like exceptions.

    Another thing to remember is that while in 64 bit mode, the processor has access to double the amount of registers compared to 32 bit mode. So while there is a higher memory cost, register contention is usually lower and so code has to store things temporarily to the stack less. This is becoming more and more true with each compiler release as the x64 code generation is becoming better. So you can't just stick to one architecture unless the other is really needed, as you can never know the actual performance differences unless you really test.

    @Azarien:

    I will always think that the proper way is to check the requirements, and do lots of testing to decide which way to go. Lower register contention and 64 bit registers/instructions may be more desirable without the larger address space.

  26. j b says:

    @Jon,

    ARM Thumb does not affect the word length of the CPU. ARM instructions are encoded in either 16 or 32 bits, with Thumb (roughly) being the instruction subset that is encoded in 16 bits. The smaller ARM processors, like the M0, can only process these instructions, while larger models process both the 16 bit and 32 bit ones. Even 16 bit Thumb instructions operate on 32 bit values and registers.

  27. Azarien says:

    @Crescens2k: the requirements are usually "must work", and that includes 32-bit Windows machines. Performance comes second.

  28. voo says:

    In my experience the additional memory pressure for 64-bit is generally evened out by the additional instructions and registers that it doesn't make any noticeable difference performance wise (on the other hand there are some programs that benefit immensely from 64bit mode even when not using more memory).

    Managed code is rather interesting in that regard, since it can get the benefit of more memory (32gb only though), the additional registers and instructions without the increase memory pressure at the rather small cost of a few extra instructions for memory access. Does the CLR do that optimization too or are JVMs the only ones doing that?

  29. Jon says:

    @j b: But the performance aspects are similar. In THUMB you gain with the smaller instruction length at the expense of having more instructions to do certain operations. In x64, the register and memory enhancements come at the expense of lower code density. In a way, THUMB is sort of rolling back some of the RISC aspects for the architecture. (And let's not get started about VLIW)

  30. j b says:

    @Jon,

    Certainly, for some operations, there are (non-Thumb) 32-bit ARM instructions doing in one instruction what would require two or more Thumb-instructions. For performance judgements, an important question is how often you really use those fancy instructions. Not very often. The ARM designers did a very good job in the dynamic analysis of actual code, selecting the ones to become 16-bit Thumb codes as those executed very frequently. If one out of a hundred instructions actually executed is one which would require more than one thumb instruction, it has very little effect on the total performance. (The 32-bit ARM instructions are not THAT complex!) One per hundred is a quite realistic figure in a lot of code. Obviously, there are cases that differ, but unless the CPU provides really complex operations that you would otherwise find aa a source code function (say, hyperbolic functioons) as single-instruction codes (ARM doesn't), you might be surprised by how large a percentage of actually executed instructions are Thumb instructions that couldn't be replaced by fewer 32-bit ARM instructions.

    There is a cost to the 32-bit instructions, too: They frequently do not complete in one clock cycle (as almost all the Thumb instructions do), they put a higher load on the memory bus etc. So before jumping to the conclusion that going all 32-bit is faster (obviously: At the same clock frequency), you must add these costs.

    Too often, you see comparisons of several alternatives, each in its own environment different from the others, and the conclusion for the SYSTEMS is applied to one selected component (such as the CPU). Full 32 bit (non-Thumb) ARM CPUs commonly run at higher frequencies, with faster memory/bus technologies, possibly with cache – faster by (system) design, not (primarily) by providing more complex instructions. Fortunately, all full 32 bit ARM processors can run Thumb code without recompilation, so if you want to do a fair comparison, you should run the Thumb code, and then the same source code complied for 32 bits non-Thunb code, on exactly the same hardware and system software. I haven't tried it myself (I don't have access to any great selction of ARM processors!), still I would be surprised if the performance improvement was more than a few percent. I would be less surprised if you could get even better performance by using a small Thumb CPU (say, an M0) and crank up the clock frequency a little, and still be way below the larger ARM CPUs in power consumption. True: All ARM processors have a very nice performance/power ratio, but they really excel in the lower range. Unless you need other 32-bit features, memory mangement, more complex interrupt handling etc., you may often be well served by one of the smaller Thumb-only CPUs even if you have to increase the clock frequency to reach the same performance. It still might save you power (if that is essential in your application).

  31. GregM says:

    "Why are people thinking that their software should be written to be compiled as 32-bit OR 64-bit?

    Your software should compile as 32-bit AND 64-bit, with nothing more than the flick of a compiler switch."

    Absolutely.  We build, install, and test the 32 bit and 64 bit versions every night, and publish both to our end users for every release.

  32. Karellen says:

    Why are people thinking that their software should be written to be compiled as 32-bit OR 64-bit?

    Your software should compile as 32-bit AND 64-bit, with nothing more than the flick of a compiler switch.

    Then you can use either, depending on your use cases. So for example, if you find that you are running out of address space or hitting some other problem because of the 32-bit-ness of one binary, you can just use the 64-bit version instead.

    If you're worried about performance issues and think that 32- or 64-bit might be significantly better than the other for your specific use case, the only way to tell is – as always – to do both and *actually measure the performance* so you can compare real numbers. And to do that, you have to build both anyway.

    [It also doubles the testing load. (Or quadruples if you want a 64-bit master to work with a 32-bit assistant.) If you are writing an in-house application, you probably have barely enough resources to test one version, much less two. -Raymond]
  33. smf says:

    64 bit code is faster, so it's definitely worth using it if you need your program to go as fast as possible.

    The only way to make sure your code is run in the way you intend, is to make sure you don't let anyone else install it. You would have to supply it on a seriously locked down machine though, or just don't let anyone run it.

    Trying to get your code to check who runs it would be like warning your staff not to accept new counterfeit bills that are indistinguishable from the real thing.

  34. Magnus says:

    As well as digging through Program Files and finding programmes there, with the new Task Bar in Windows 7 it is far too easy to start a separate invocation of a programme.

  35. immibis says:

    @smf: see the second paragraph of the post.

  36. Crescens2k says:

    @Azarien:

    From my first post in these comments

    "Sure, stability is good, and changing for the sake of it is bad, but that is no reason to stick to the lowest common denominator if you can take advantage of more."

    Sure getting it to work on 32 bit systems is important, but why would that stop you from taking advantage of the 64 bit system? Also what about Windows Server core or Window PE environments where the WoW64 subsystem isn't available by default. More and more companies are releasing multiple builds of their applications which you can choose between or even install side by side. You can detect processor capabilities at runtime, so why not take advantage of that fact.

    One really awesome bit of software I would like to shine the spotlight on here is Process Explorer. It takes advantage of the fact that the average system is capable of running 32 bit code. Detects the processor, and if it finds that it is a 64 bit system, it will save the 64 bit binary which is stored as a resource. It will then re-launch itself using the 64 bit binary.

    So I am all for using the system as much as possible, I am also for actual performance testing to see if there are possible gains.

  37. Marc K says:

    @Crescens2k:

    I don't like the method Process Explorer uses.  When they first implemented this, the size of the executable doubled.  And then you have both the original 32-bit and the 64-bit running in memory.  The 64-bit version also needs to be extracted to a user-writable area of the system.  So, that executable is not properly secured.

    I'd rather they just package separate exes in the zip file and have the the 32-bit version launch the 64-bit version if a 64-bit OS is detected.

  38. Crescens2k says:

    Well, I was using that as an example of how easy it is to detect these things, not about the layout itself.

    Mental note, disclaimers would avoid this kind of situation where things I find obvious aren't to other people, or context is lost on people.

  39. Joe says:

    Had to do exactly this two years ago. My solution was to:

    1) Pass an integer argument to the child process

    2) The child process would use the integer to create a name and use it as the base for a shared memory queue and a shared memory settings block.

    3) The child process would find parent and put a wait on the process handle (to ensure they shut down when the parent was abruptly terminated.)

  40. smf says:

    @immibis  

    I read it and understood it, I was just restating the issue.

    No matter what you do to try and stop executable b from running unless it's called from executable a, someone else can take executable a and extract your magic incantation. Everything you can do to verify that you were started by executable a can be impersonated. As soon as your executable is able to be run on a computer you do not control then expect that it can be compromised.

    You can protect against the user accidentally clicking on executable b, but that doesn't answer the "How can I make sure my program is launched only from my helper program and no other parent?". Because "only from my helper program" means in no circumstance at all, ever, no matter what the person does because "I" can't cope with the terrible consequences that will happen if anyone is ever able to do that (like I'll lose my job/money/life etc).

Comments are closed.