What does the COINIT_SPEED_OVER_MEMORY flag to CoInitializeEx do?

One of the flags you can pass to Co­Initialize­Ex is COINIT_SPEED_OVER_MEMORY, which is documented as

COINIT_SPEED_OVER_MEMORY: Trade memory for speed.

This documentation is already vague since it doesn't say which direction the trade is being made. Are you reducing memory to increase speed, or increasing memory by reducing speed? Actually it's neither: If you pass this flag, then you are instructing COM to consume more memory in an attempt to reduce CPU usage, under the assumption that you run faster by executing fewer cycles.¹

The request is a per-process one-way transition. Once anybody anywhere in the process puts COM into speed-over-memory mode, the flag stays set and remains set until the process exits.

When should you enable this mode? It doesn't matter, because as far as I can tell, there is no code anywhere in COM that changes its behavior based on whether the process has been placed into this mode! It looks like the flag was added when DCOM was introduced, but it never got hooked up to anything. (Or whatever code that had been hooked up to it never shipped.)

¹ As you know, consuming more memory is not a guarantee that you will actually run faster, because higher memory usage increases the chances that what you need will take an L1 cache miss or a page fault, which will cost you dearly in wait time (though not in CPU usage).

Comments (20)
  1. Dan Bugglin says:

    Maybe it's a placebo.

  2. Joshua Ganes says:

    I agree that "Trade memory for speed" is hardly descriptive. With SPEED_OVER_MEMORY within the flag, it seems pretty clear that speed is being favored. Unfortunately, it's much ado about nothing considering that the flag has no effects.

  3. dave says:

    Doesn't it mean "if you don't remember what this option means, it'll take you a little longer to write the code for the call" ?

  4. An L1 cache miss won't end your thread's timeslice – either the pipeline will stall for a few cycles while it waits for L2 cache or main memory to deliver the result, or it'll process a few instructions from the other thread if it's hyperthreaded. (Having L1 cache misses trigger scheduler activations would lead to all kinds of pain – particularly once the scheduler code itself, or the run queue, falls out of L1 cache.) Whether the wasted/lost core clock cycles count against your CPU usage depends how that's being tracked: the CPU timestamp counter's behaviour varies between revisions; the number of periodic timer interrupts firing during your timeslice will be increased by L1 cache misses, since from a kernel perspective you're still scheduled on that core during the memory fetch.

    If it cached, say, the results of a registry lookup, they may have discovered the "cached" results weren't usefully faster than just fetching from the Registry in the first place. There are more than a few "optimisations" out there which really slow things down in reality…

    Could anything you access via COM read whether this flag is set? I could imagine their use case involving other code rather than COM itself taking account of the setting – or data transfer: using a megabyte buffer instead of transferring a page at a time, for example – and with bigger memory sizes these days, the context-switching cost may have dwarfed the benefits of saving a megabyte of RAM, so the latter option could then have been dropped.

    Reading the older article reminded me of some profiling on web servers; under heavy load, generating the timestamp string sent in the reply was accounting for non-trivial time in itself, so that server was changed to update a shared date string no more than once per second. With thousands of requests per second, caching that 28 character string (and the associated system call to retrieve the current time) made a difference in itself.

    [The remark about CPU usage had to do with the page fault case. When a thread takes a page fault, the thread is unscheduled while waiting for the page to come off the disk. -Raymond]
  5. @dave:

    Raymond owes you a star.

  6. John Doe says:

    Please excuse my ignorance about comment formatting.

    I intended to say that this flag is ignored, according to this archived message:


  7. Maurits says:

    I read "speed over memory" as follows:

    There are two ground rules for programming.

    1. Code that runs faster is better than code that runs slower.
    2. Code that consumes less memory is better than code that consumes more memory.

    "Speed over memory" implies that rule 1) is more important than rule 2).

  8. deduplicator says:


    There's an additional rule:

    1. Code that is smaller is better.

    As far as I can see, all three rules help and hinder each other, interacting often quite surprisingly and counter-intuitive.

  9. JustSomeGuy says:

    0) Code that runs correctly is better than code that runs fast – there's nothing more un-optimised than 'wrong' :-)

  10. JustSomeGuy says:

    See also stackoverflow.com/…/80189 for a (now updated) answer to this question.

  11. Cheong says:

    @JustSomeGuy: If you count that in, I think you've omitted a more important rule.

    -1) Code that never runs is fastest. – Stop inserting garbage in your code.

    Unintentional violation of this rule often occurs in multithread programming, where someone use locking struct in code block that would never introduce race condition, and use a value type variable to lock instead of reference type.

  12. caf says:

    So COINT_SPEED_OVER_MEMORY works exactly like the "Door Close" button in an elevator.

  13. pinwing says:

    The MSDN page must have been updated, because currently it documents COINIT_SPEED_OVER_MEMORY as follows: "Increase memory usage in an attempt to increase performance."

  14. Nick says:

    @caf: I've actually been in a few elevators where the "Door Close" button actually makes the doors close. Sure, in most, the button does nothing. But in these few, the button reduces the door-open-delay from several seconds to one or two or immediately starts closing the doors.

  15. John Doe says:

    As far as I can tell, one possible use for this flag could be to keep factories referenced/locked between COM calls, possibly releasing after some time without usage (and at the next COM call). This could be particularly useful to let the programmer not have to grab a factory for himself to guarantee faster batch object creation.

    However, this could have implications in the current application, in other applications and even in other computers, depending on where, how (e.g. CLSCTX) and which kind of COM objects you're creating/binding/loading/etc. That is, it would imply to try to be faster at the expense of the memory where the actual factory is instanced and all proxies between.

    Generally, it could mean a set of caches for everything that needs to be looked up, but that may be assumed to be constant for a brief period (exercise: define brief).

  16. @pinwing… if that quote is accurate then it is hilariously inaccurate documentation.  If it were accurate then simply adding a few arbitrary calls to AllocMem() would improve the performance of my code.  ;)

  17. JustSomeGuy says:

    You'll find those door close buttons start doing something real when the building alarms (eg, fire) have been activated. It's a building code violation in many jurisdictions for them to do nothing at all.

  18. 640k says:

    @Jolyon Smith: A call to AllocMem may in fact increase your performance because other cpu-competing processes may die in an out-of-memory crash.

  19. 640k says:

    @JustSomeGuy: The door close button usually reboots the micro processor which controls the door. That's why there's no immediate noticeable action. It's a RESET BUTTON.

Comments are closed.