The cost/benefit analysis of comparing before an assignment


Consider these two code fragments:

// Fragment 1
x = newValue;

// Fragment 2
if (x != newValue) x = newValue;

Which is faster?

The answer to that question is "It depends."

Fragment 2 introduces extra work (a compare and a branch), and depending on the pattern of comparison results, the branch predictor might or might not have trouble predicting the result of the comparison. From the branch predictor's point of view, it would prefer that the value always be different or the value always be the same, since that lets it predict with 100% accuracy.

On the other hand, Fragment 2 has the advantage of not writing to memory unnecessarily.

For languages with a write barrier, this will result in extra work to track the memory modification as well as creating extra work for the garbage collector because it needs to scan all modified memory in old generations to see if they point to objects in newer generations.`

Even for languages without a write barrier, unnecessary writes to memory have a cost.

The first write to a copy-on-write page will trigger a copy.

Every write to a page makes the page dirty and means that it will have to be written to disk when paged out, even if the contents in the page file match the values in memory.

On a multiprocessor system, writing to a cache line causes the processor doing the writing to become the owner of the cache line. When multiple processors all write to the same cache line, you suffer from a phenomenon known as cache line bouncing, where ownership of the cache line keeps moving between processors, and the cache lines in the other processors become continuously invalidated due to the write.

So what tips the balance one way or the other?

First consideration: How often is the write avoided?

If the value is usually changing, then the comparison doesn't save you much because you end up doing the write most of the time anyway.

Another consideration is whether the variable is in a page that is otherwise read-only. If so, then you may want to avoid the write to keep the page clean. (This may be worthwhile even if the write is avoided only part of the time, because the cost of writing a page to disk is relatively high on a CPU time scale.)

Similarly, is the write into a cache line that is otherwise read-only? Avoiding the write allows the cache line to remain valid on all other processors and avoids cache line bouncing.

Is this cache line even being shared by multiple processors in the first place? If the variable is used by only a single thread, then it won't be in the cache of other processors anyway. (Modulo false sharing.)

These are all considerations to take into account, but as with all performance-related issues, the only way to know for sure is to test it under representative loads. Still, these early considerations may let you filter out the cases where avoiding the write is unlikely to offer any performance benefit.

Comments (23)
  1. kantos says:

    I realize in kernel-land this probably matters a ton, but if I saw this code in code review I’d probably flag it. The reason being that we shouldn’t try to ‘help’ the compiler. Moreover on modern systems as we’re well aware with SPECTRE and MELTDOWN that write would have to have a full fence in front of it inserted by the compiler since it’s conditional. Without the conditional that mitigation isn’t necessary. The end result is that the page will be resident and probably copied long before I ever have to worry about it. Hence I’d rather the write be non-conditional as it makes the code clearer and easier to understand. That then saves time from another engineer that wonders why someone would do that in the first place.

    Final note: by removing the conditional you also let the compiler reorder it as appropriate and potentially the CPU can too! (Although x86 and x86-64 won’t)

    1. Well, if only there was a comment optimizer to optimize “Although x86 and x86-64 won’t” into “Although x86” because “x86-64” is “x86” already. 😉 But then, it begs the question: What platform were you talking about so far?

  2. Antonio Rodríguez says:

    This article’s discussion assumes that the assignment is atomic or of limited size (i.e. a 64-bit value in a 32-bit processor). If you are writing to a larger object (an array/string, a whole structure, etc.), the question is wether the assignment itself is significantly more expensive than the compare. Cache and paging continue to be an issue, but as large objects use to span several cache lines and can straddle between two pages, their analysis is more difficult.

    1. kantos says:

      Honestly? Unless it shows up as a constant issue in the profiler in some hot code… I’m probably going to stick to the simple answer and allow the assignment unconditionally. Like all micro-optimizations, profile first, make changes later.

      1. kc0rtez says:

        “Like all micro-optimizations, profile first, make changes later.” with complex structures it can cut more than half the work so you shouldn’t always consider it a micro-optimization; Also, bear in mind not all languages have good profilers to pinpoint the issue AFTER the code is there, so it might save you quite a lot of hassle if you optimize simple stuff during development. In the end, it might just be better to code with common sense.

        1. kantos says:

          This isn’t a language issue, this is statistical analysis issue. You don’t need specialized profilers to do this. ETW could do it. Unless that function in particular is a hot spot I wouldn’t worry about it, because 90% of the time you’re wrong in your original assumptions. For example most people would assume that smaller structures are always better. But if that causes your structs to then cross cache lines in an array it could be slower. But even that doesn’t matter because the CPU prefetcher may detect that data pattern and fetch it anyway. My point is that there are too many variables to quantify in this case to assume. For all you know the CPU could consistently mis-predict on that branch which would cause a massive penalty on some CPUS. It’s also worth noting that conditional writes are the core of SPECTRE and MELTDOWN, so more than likely if you have those mitigations enabled it is going to insert a full fence after the read. In Kernel-land there are other considerations (triggering a page fault may cause a bug check), that need to be accounted for. But in userland you should always always profile first and never assume.

      2. Antonio Rodríguez says:

        There are “micro-optimizations”, and then are what I call “inline micro-optimizations”. It consists of writing code while having the low level data structures in mind (“reverse-engineering the compiler”, if you want to see it that way). Then, when you write a allegedly expensive (or “heavy”) operation, stop a few seconds (no more than five or ten!) to think how can you optimize it out. It doesn’t take too much time once you get used, and the results are spectacular. But you need to have some knowledge of what’s under the hood, even if you are using a modern, high-level, garbage-collected language.

        Of course, if you find yourself asking the same questions about the same datatype over and over, you may want to abstract it in a class or a module. Or maybe it’s that the data structure you are using isn’t the right one for the job. Don’t be afraid of refactoring or throwing out code (unless your company policies don’t allow you to; but then, your case belongs to The Daily WTF…).

      3. alegr1 says:

        To be fair, the cost of write may not be seen by the profiler immediately at this line.

  3. kc0rtez says:

    “if (x != newValue) x = newValue;” should not be the optimal solution in pretty much any scenario, since there are smarter ways to write the conditional. A better solution would be to assign said variable where the value is changed, avoiding any further checks thus getting optimal performance, or simply querying the value directly instead of storing it into a variable (Considering you either can track the changes in the value or can query it at any given time; but heck, i can’t think of a scenario where at least one of these is true).

    1. Chris Crowther says:

      In .NET apps where you commonly see that conditional check is in a property setter that checks to see whether the value being assigned to the property is actually different to what it already has; the additional expense there is after changing the value you often raise a PropertyChanged event for the property, with all the associated overhead that goes with it.

    2. zboot says:

      I think you didn’t read the article. If you always write to memory when the value is written, that results in an expensive operation if the value being written doesn’t actually change. You’re assuming the only time a write occurs is when there’s a change of value.

  4. mvadu says:

    in any modern languages which gets compiled multiple times (C#->IL->machine code) even if we profile this is not something you will notice. This level of tuning only makes sense if I am coding for Arduino or ESP32 kind of processors. (I can’t relate to Windows kernel development as I have not done any, and I don’t myself getting into Raymond’s organization anytime in future)

    1. CN says:

      That’s why Raymond’s points about paging are more relevant. Now, in some high level languages that might be less likely, but with e.g. page deduplication in both OSes and hypervisors, you could at least in some circumstances have your actual workset minimized by copy-on-write semantics even when never really caring about it. If you have any rate of swap acitivity or if the data is backed by a memory-mapped file (less likely in the .NET case, I know), then it could matter greatly. And, of course, if you are in the pathological case of lots of threads accessing (and writing to) shared variables in the same cache line.

  5. DWalker07 says:

    A similar issue arises in databases, where you want to change some values in a table. During data cleanup, should you replace all values in a character column with a space-trimmed value, for example, or only replace those values that don’t already match their space-trimmed equivalents? I lean toward reducing attempts to “rewrite with the same values” in database tables, especially if that value usually won’t actually change.

    Even though I’m not sure if the database engine is smart enough to not actually change the data and rewrite the table data-pages in these cases. (Don’t try to fool the compiler or the database engine internals.)

    1. Karl says:

      A database engine should not trim written values. A database should store what it is told to store, should not try to be pragmatic.

      1. DWalker07 says:

        You can’t make a blanket statement like that. When data comes from systems (that WE have no control over) and gets directly imported into database tables, then cleaned, IF that data came from hand-entered account numbers or Zip codes (for example) where leading and trailing blanks should not actually be there, then trimming the blanks from the data is the right thing to do. Who wants to keep leading blanks on a Zip code?

      2. IanBoyd says:

        @Karl ANSI/ISO SQL-92 specification requires database engines to trim trailing spaces (when doing certain things).

        And you certainly don’t want to needlessly take exclusive locks, and trigger triggers, when it’s not needed.

        Like you don’t want to cause a page copy when it’s not needed.

        1. Yukkuri says:

          What certain things exactly?

        2. DWalker07 says:

          Yep, but we get leading spaces too, which — for some data values — are not significant (and shouldn’t be there).

    2. Ian Yates says:

      Good example though. And it shows how it can get more complicated as you go to higher levels of abstraction.

      If this wasn’t a field, but a c# property with a setter, you’ve got side effects.
      We do similar things with our Knockout JavaScript code where we avoid writing what looks like the same object (same data, but different instance) since it’s wasteful to do so – it’ll trigger downstream notifications, which might repaint screens, etc to actually have zero visible change.
      We hide that complexity by using a comparer on the observable so we can write the simple assignment but get the “only write if meaningfully different” behaviour implicitly (well, in this case it still writes but only notifies if different)

      This is part of react’s virtual dom claim to fame.

      For databases, updates are a bit like a property set since you have triggers, concurrency, etc.

      In short, abstractions matter, and this shows advice in one situation can’t be cargo cult applied to all.

      I like nuance ☺️

  6. The_Assimilator says:

    “Factors pull in both directions. The result is a balance.” Very Zen of you, Raymond. :)

  7. IanBoyd says:

    Similar to the question Eric Brumer posed in one of his C++ performance talks.

    You want to call a function pointer (which is simply what a virtual method call is):

    Version 1:

    pFunc(42);

    Version 2:

    if (pFunc == FuncAdd)
    FuncAdd(42)
    else if (pFunc == FuncLn)
    FuncLn(42)
    else
    Func(42);

    Rather than stalling on the fetch of the address contained in pFunc, you let the CPU’s Branch Predictor (which is right 90% of the time) simply start executing the function.

    The Visual Studio compiler, with Profile Guided Optimizations (PGO), will insert these kinds of optimizations for you.

  8. You are missing a third fragment.
    /******************************************/
    x = newValue;
    if (x != newValue) x = newValue;
    /******************************************/

    This is something I often do when most of the time a condition is true but rarely it’s not.
    After all a single write tends to be faster than both a read and a write.
    Sure, one could argue that a if conditional is done needlessly, and perhaps earlier in a program a state is known and checkable to avoid reaching this code at all.

    The fastest code to execute is the code you don’t have to execute at all.

    I often “waste” extra code before or after a loop if it means I end up with less code inside the loop for example.
    I try to apply the same principle to if else as well.
    And within functions I often do just a if then exit the function rather than a if else nesten inside another if else. I call this the “exit early principle”, in that if a condition is false then just exit the function. This means most of my functions only have a single “if” nesting level.

Comments are closed.

Skip to main content