Is WriteProcessMemory atomic?


A customer asked, "Does Write­Process­Memory write the memory atomically? I mean, if I used Write­Process­Memory to write 10 instructions for a total of 20 bytes, can Write­Process­Memory wrote those 20 bytes atomically?"

CPUs typically provide only modest atomic update capabilities. The x86 family of processors, for example, can update up to eight bytes atomically. Twenty bytes is beyond the capability of the processor.

I was kind of baffled at what sort of mental model of computing the customer had developed. It apparently permits Write­Process­Memory to accomplish something that the CPU is not physically capable of performing.

"Will my aluminum hammer withstand temperatures above 700C?"

Given that aluminum melts at 660C, it doesn't matter whether you make a hammer or a ladder or a scaffold. As long as you make it out of aluminum, it will melt at 660C because that's a fundamental property of aluminum.

The only thing I can think of is that the customer thought that maybe the kernel suspended all of the threads in the process, updated the memory, and then unfroze them all. It wouldn't be an atomic update in an absolute sense (somebody else doing a Read­Process­Memory might read an in-progress write), but it would be atomic from the viewpoint of the process being written to.

But no, the Write­Process­Memory function does no such thing. It merely writes the memory into the process address space.

Another way of thinking about it is using the thought experiment "Imagine if this were true." If it were true that Write­Process­Memory provided atomicity guarantees for 20 bytes, then all sorts of multi-threaded synchronization problems would magically disappear. If you wanted to update a block of memory in your process atomically, you would just call Write­Process­Memory on your own process handle!

I noted that the underlying scenario sounds really fishy. Using Write­Process­Memory to update code in a process sounds an awful lot like the customer is writing a virus. One of my colleagues who studies malware agreed, adding, "On the other hand, some anti-malware products also use that approach, as dubious as it is. For the record, I would like to add, 'yuck'." My colleague asked the customer for further details on what they are doing, and why they think that Write­Process­Memory is what they need, so that a proper solution to their underlying problem could be developed.

We never heard back from the customer.

Comments (33)
  1. Deduplicator says:

    You totally psyched that one, no wonder he ran.

  2. Joshua says:

    And the punchline I was expecting was "with respect to other calls to ReadProcessMemory and WriteProcessMemory". Incidentally atomic means succeeds in whole or fails in whole while the property being talked about is isolated.

    It would be interesting if WriteProcessMemory in fact could not succeed in part and fail in part but I'm pretty sure that is not true.

  3. Matt says:

    The kernel can totally write 20bytes via WriteProcessMemory atomically if it choses to do so:

    void AtomicWriteProcessMemory(HPROCESS hProc, size_t lpDst, void* lpSrc, size_t cbSize)

    {

     SuspendAllThreads(hProc);

     WriteProcessMemory(hProc, lpDst, lpSrc, cbSize);

     ResumeAllThreadsThatIJustSuspended(hproc);

    }

    [This was discussed in the article, -Raymond]
  4. Matt says:

    Also "updating code sounds like a virus" is also assuming the worst of your customers. AntiVirus companies, AppCompat libraries, debuggers, ProcDump, WerFault, live HeapDebuggers, VMMap, people hooking their games to make them "more cool" (or to bot the game) and a whole ton of other legitimate – albeit quite low-level debugging – tools make use of this functionality.

    The thing that stops virus writers is not atomicity of the WriteProcessMemory, but the ACL check of OpenProcess. If a virus writer can OpenProcess with VIRTUAL_MEMORY_WRITE permission, you've already lost, atomic-writes or not.

  5. Gabe says:

    Matt: Yes, I suspect that the customer was asking if that's how it was implemented.

    That makes me wonder, though, how the SuspendThread operation is implemented. If it can generate an interrupt on the CPU executing the thread, it can stop immediately. It it just sets a bit in the thread's state saying that it's suspended, it will execute until its quantum runs out or is preempted. If it had to wait until the threads' quanta expired, it would be a very slow call indeed.

    However, I don't know that this question is necessarily nefarious. If I were writing a debugger, isn't it reasonable for me to have the same question?

    [You'll learn more about this next February. -Raymond]
  6. Yuri says:

    If you were competent enough to write a debugger I doubt you would ask if writing large blocks to memory can be performed atomically. Smells fishy for sure. I wouldn't even consider AV programs as legitimate.

  7. Mike says:

    Matt, Gabe: I am hard pressed to find a scenario where a debugger might write large blocks of memory while the process is still running. Keep in mind that in this case, it wouldn't even need the Kernel to suspend the threads: The debugger can (and usually does) suspend them itself.

    Yuri: Symantec seems to agree with you: http://www.zdnet.com/antivirus-is-dead-long-live-the-antivirus-7000029078

  8. Matt says:

    @Mike: "The debugger can (and usually does) suspend them itself".

    So? The debugger can (and sometimes does) do VirtualProtectEx under the hood to ensure that the destination has WRITE permissions at the destination, but WriteProcessMemory will do that for you if you forget. Suspending threads is slow, and is complicated by the fact that you need to iterate until all of the threads are suspended to beat race-conditions (in case a new thread is created during your operation), and you need to ensure that you only restart the threads that were suspended (i.e. not accidentally "restart" threads that were suspended before the operation).

    That's work you CAN do yourself, but if the kernel is doing it already, it's work that's not necessary.

    Which is why the question "Does WriteProcessMemory operate atomically?" is an entirely legitimate question.

    @Yuri/Mike: "I am hard pressed to find a scenario where a debugger might write … while the process is still running"

    HotPatching a process to monitor API calls (e.g. calls to HeapAlloc) on another running process is one such example.

    [The debugger can just patch HeapAlloc at the initial process breakpoint, at which point has already frozen all threads in the process, so there is no need to patch the function while the process is running. Similarly, if the function is in a dynamically-loaded DLL, it can patch the function at the DLL load notification breakpoint. -Raymond]
  9. Adrian says:

    As to the comment about someone competent enough to write a debugger should already know about atomic writes …

    Lots of not-yet-competent people learn by doing.  I'm always astounded by the number of questions of Stack Overflow about DLL injection, detours, hooking, keylogging, and other techniques often used by debuggers and malware and anti-malware and regular old apps written by not-yet-competent people.

  10. Yuri says:

    @Adrian

    Yep the kind of questions Dread Pirate Roberts would ask ;)

  11. Andre says:

    I too think that the only fair interpretation of the question is the second one. Is the write atomic to the target process, i.e. does the kernel do suspending. The "Imagine if this were true" experiment isn't helpful either. The question is very likely not about writing to your own process, and (assuming the kernel did suspend and resume threads) it would just be an awfully inefficient lock mechanism.

    About the whole scenario being fishy: Apparently there are reasons for Write­Process­Memory to exists at all. On first glance, the whole function seems fishy, before you think about debuggers and the like.

  12. GrumpyYoungMan says:

    Isn't the underlying mystery how a virus writer could afford a support contract with Microsoft so that they were able to ask the question?

  13. Matt says:

    @Raymond: "[The debugger can just patch HeapAlloc at the initial process breakpoint, at which point has already frozen all threads in the process, so there is no need to patch the function while the process is running. Similarly, if the function is in a dynamically-loaded DLL, it can patch the function at the DLL load notification breakpoint. -Raymond]"

    Starting the process in debug mode changes the behavior of lots of things, including the heap. Perhaps my debugger wants to analyze how the real heap works because of a heisenbug. Or perhaps I want to be able to hook a process that is actively hostile to debuggers (like, say, a game that I need to develop an app-compat shim for, but which has anti-debugging across the process).

    [Assuming you don't want to use -hd to disable the debug-mode heap, your debugger is still going to suspend all the threads as part of the initial attach, so the threads are already suspended at the time you need to do your patching. -Raymond]
  14. Yuri says:

    @GrumpyYoungMan

    Aren't they all intern for the CIA anyway?

  15. Klimax says:

    Might be naïve porting of ancient code which uses self modifying code… (IIRC old version of ZDoom used it) Also it could be for very fringe of non-malware or non-security field like Demoscene.

  16. Stefan says:

    This question doesn't sound as ridiculous to me as it may have sounded to you.

    I know at least one operating system (INTEGRITY) which works like this: invoking a syscall suspends the scheduler. Most syscalls have a bounded runtime. Those that can run for a longer time periodically check for a preemption request, and then save state, return, and ask the application to retry the call.

    In that operating system, the equivalent of WriteProcessMemory for 20 bytes would be atomic (I believe the threshold for allowing preemption is somewhere around 128 bytes).

    This works perfectly on single-core processors. I don't know how they do multi-core, though.

  17. Joshua says:

    @Stefan: Having dealt with a multi-core OS with this property, the answer is multi-core isn't worth much anymore.

  18. dave says:

    The suspend-all-threads approach doesn't get you memory atomicity. There are non-CPU agents that can access memory.  Memory atomicity has to involve the memory controller.

  19. f0dder says:

    Raymond, if somebody were writing a virus, they'd be stupid to file a customer ticked with Microsoft – even back in the days of dialup, there was a lot of information easily available for people who wanted to do that kind of thing. If you were a normal developer who didn't have interest in those keywords, a lot of lower-level coding stuff would slip you by.

    Hoping that this doesn't violate the "don't mention product names" rule :) – I bugfixed the Windows re-release of "a game about extra-terrestrials invading earth" (the original two, the rest of the series sucked) with memory patching. The original "game platform which name is something you'd find on a pipe" re-re-release of those games used my patch, although I believe they've moved to DOSbox now. So there's definitely non-malicious uses for doing memory patching of code.

    Also: "The debugger can just patch HeapAlloc at the initial process breakpoint" – are all theads automagically suspended when a debugger does "attach to process"?

  20. Matt says:

    >> [Assuming you don't want to use -hd to disable the debug-mode heap, your debugger is still going to suspend all the threads as part of the initial attach, so the threads are already suspended at the time you need to do your patching. -Raymond]

    This is just becoming an example of moving the goalposts. Suppose I'm live-hooking a process that does a query to NtQueryInformationProcess to find if the process is attached, which means that I can't attach a debugger like WinDbg. Perhaps another example might be someone hotpatching a process, in much the same way that Microsoft (used to) hotpatch binaries as part of an auto-update.

    The point is that there are lots of reasons why someone might want to WriteProcess to another process (because otherwise why is that function even there?). If you concede that someone might want to WriteProcess to another process, you have to concede that they might want to do so atomically. Which makes the question not quite as ridiculous as you claim it to be.

    [At the end of the day, somebody needs to suspend all the threads if you need intra-process atomicity. Intra-process atomicity is an odd requirement and is not the common case use of WriteProcessMemory, so it would make sense to shift the work outside WriteProcessMemory. (And, as I noted before, WriteProcessMemory exists primarily for debuggers, and debuggers already are suspending threads anyway.) -Raymond]
  21. Myria says:

    I asked pretty much the same question as this article – how to implement atomic patching – on the OSR Online mailing list and got flamed every which way for it.  The difference is, I was writing a kernel driver, and was asking from the perspective of how to suspend everything, since obviously the processor can't do the writes atomically.  I.e. I know what I'm doing in terms of patching, though not much in terms of NT kernel development.  And no, I'm not writing malware. >.<

    WriteProcessMemory is convenient in that it calls NtProtectVirtualMemory for you in addition to NtWriteVirtualMemory – you don't have to call VirtualProtectEx yourself unless you're writing to a few things, such as import tables.

    I think programmers are too afraid of self-modifying code now.  There are a couple of things that would benefit from it.  Selection among code paths based on CPU features is one thing – you can avoid indirect branches by patching jumps at startup after detecting whether the CPU supports AVX.

  22. voo says:

    I'm now going to be absolutely nitpicky and all, but: x86 actually supports double CAS using CMPXCHG16B so we could actually update 16 byte atomically.

    Still not 20 byte though so useless for this question.

  23. JM says:

    "I think programmers are too afraid of self-modifying code now."

    With good reason. It's murder on the pipelines.

    Of course a one-time patch like you describe is OK. But that's not really what most folks call "self-modifying code".

  24. lwahonen says:

    Or they were patching interop to some horrible, horrible old system.

  25. Guest says:

    Sorry I didn't get it, what will happen "next Februry" ??

    ["That makes me wonder, though, how the SuspendThread operation is implemented." -Raymond]
  26. Cesar says:

    @Myria: if you are at startup, there's only one thread – the one running main() or WinMain() or wWinMain() or whatever it's called on Windows these days. Or there should be, since Windows programmers love injecting DLLs and threads on unrelated processes and making them crash in unexpected and exciting ways. But even then, these injected threads and DLLs are not running *your* code (I hope), so you can still patch it before doing any real work without fear of race conditions.

  27. Fleet Command says:

    Assuming bad faith in a customer, aren't we? This is probably someone who had read a feature of atomic write somewhere and thought "Wow! That's awesome. Let's always do it."

    Reminds me of when I was trying to use database transaction to ensure power outage resiliency, whereas I should have used change-log flushing. It was my first time and the documentation was poor. But someone in the know thought it was fishy that I am starting a transaction for only one change and concluded that I was doing something behind the scene. Fortunately the misunderstanding was cleared quickly. (Still, he didn't know about change-log flushing either. For him, UPS was the God's answer to all his prayers.)

  28. caf says:

    I think it is quite unlikely that the interrogator was writing a virus, because a virus writer wouldn't really care about a rare race condition that could crash the target process.

  29. Andre says:

    [At the end of the day, somebody needs to suspend all the threads if you need intra-process atomicity. Intra-process atomicity is an odd requirement and is not the common case use of WriteProcessMemory, so it would make sense to shift the work outside WriteProcessMemory. (And, as I noted before, WriteProcessMemory exists primarily for debuggers, and debuggers already are suspending threads anyway.) -Raymond]

    Yes. You are giving a perfectly reasonable rationale for why WriteProcessMemory was designed not to do that.

    That doesn't mean the question about it is horrible itself.

    [Shouldn't the person have done this thought experiment before asking the question? "What would it mean for WriteProcessMemory to be atomic given that the CPU is not capable of doing so? And why would the OS add this extra complexity?" -Raymond]
  30. RegularReader says:

    > Sorry I didn't get it, what will happen "next Februry" ??

    Raymond has a queue of blog entries that he's written but not yet posted.  That way, he doesn't have to write on a schedule – he can write whenever he likes, and they get posted on a regular schedule.  Last I heard, the queue was a bit over a year long.  So I think he has written a blog entry that answers your question, and it's currently in the queue and will probably be published in Feb 2015.  (Stuff sometimes gets moved around in the queue, so that date is subject to change).

  31. Anonymous Coward says:

    I've used Write­Process­Memory for several little debugging tools, a bunch of compatibility shims and to assist in virtualising some files and devices for the purpose of automation and to allow processes to communicate with each other in novel ways.

    As to answer Joshua's question: ERROR_PARTIAL_COPY

  32. Wizou says:

    Actually, freezing all thread before Write­Process­Memory wouldn't achieve atomicity even from the viewpoint of the process.

    If one of the thread of the process is in the middle of a memcpy from the target area, it might copy 8 bytes from the previous data of the area, get frozen, then after resume it would copy 12 more bytes from the newly written data.

    Because atomicity doesn't exist on x86 either for read operations beyond 8 bytes.

  33. j b says:

    Wizou,

    The write would still be atomic – in the scenario you describe read is what is non-atomic. If you read one piece, then go away and do something else, and come back later to read the rest, then AFI, YGI.

    Obviously, if you wrap the reads and writes into some monitor or critical region software mechansims, you can make it atomic, as long as you can atomically write a flag associated with the memory areae indicating "Stay off – I am busy with this area now!". It takes another byte (at least) of RAM for the flag, and a few cycles of CPU power, but the biggest problem is to force every user of the data to go through the locked gate. Actually, the overhead of a software controlled lock (based on atomic read/write flags – call them semaphores if you insist, but don't think that of them as as OS level semaphores with all THAT overhead!) can be made much lower than many people seem to think. The cost is quite reasonable (unless, of course, if you have a thousand threads that insist on refreshing the shared value every millisecond).

Comments are closed.