Five-Dollar Words for Programmers, Part One: Idempotence


Programmers, particularly those with a mathematical background, often use words from mathematics when describing their systems. Unfortunately, they also often do so without consideration of the non-mathematical background of their colleagues. I thought I might talk today a bit about the word “idempotent”. This is a very easy concept but lots of people don’t know that there is a word for it.

There are two closely related definitions for “idempotent. A value is “idempotent under function foo” if the result of doing foo to the value results in the value right back again.

function is “idempotent” if the result of doing it twice (feeding the output of the first call into the second call) is exactly the same as the result of doing it once. (Or, in other words, every output of the function is idempotent under it.)

The most trivial mathematical example of the second kind is the constant function.  If f(x) = c, then clearly f(x) = f(f(x)) = f(f(f(x))) … so f is idempotent (and the constant is idempotent under it). The identity function f(x) = x is also idempotent (and every value is idempotent under it). The function that takes a set of numbers and returns a set containing its largest element is idempotent (and every one-element set is idempotent under it). I’m sure you can think of lots of examples of idempotent functions.

To get a handle on the other sense, pick an operation — say, doubling.  The only value which is idempotent under that operation is zero. The operation “subtracting any non-zero value” has no idempotent values. Squaring has two idempotent values, zero and one.

The second characterization of this concept comes up all the time in practical programming, particularly around caching logic. Usually when used in the computer science sense we don’t mean that the effect of the function is invariant under composition, but rather that it is invariant over the number of calls. For example, I don’t know how many times I’ve written:

HRESULT GetTypeLibCreator(ICreateTypeLib2 ** ppctl) {
  if (this->m_pctl == NULL) {
    HRESULT hr;
    hr = CreateTypeLib2(SYS_WIN32, pszName, &this->m_pctl);
    if (FAILED(hr)) return hr;
  }
  *ppctl = this->m_pctl;
  this->m_pctl->AddRef();
  return S_OK;
}

A nice little idempotent function — calling it two, three, n times has exactly the same result as calling it once.

The place you see the other characterization of idempotence all the time is in C++ header files. Include a needed header zero times and you’ll get “not defined” errors. Accidentally include it twice and you’ll get “redefinition” errors. It’s a major pain to make sure that every header file is included exactly once. Therefore, most headers use  some trick to make them idempotent under the inclusion operation:

#ifndef STUFF_H_INCLUDED
#define STUFF_H_INCLUDED

// headers here

#endif // STUFF_H_INCLUDED

or in more modern systems, the #pragma once directive makes headers idempotent under inclusion.

Comments (20)

  1. Matt says:

    Good stuff… I approve.

  2. mschaef says:

    Funny stuff. After doing some DBA style work a few months ago, I wrote a little on idempotence myself. I like the technique a great deal.

    http://www.mschaef.com/cgi-bin/blosxom.cgi/tech/prog_well/idempotence/idempotence_1.txt

    http://www.mschaef.com/cgi-bin/blosxom.cgi/tech/prog_well/idempotence/idempotence_2.txt

  3. Just to be annoyingly picky, "subtraction" is a function that takes two values, so I’m not sure what it means to say that no value is idempotent under subtraction. Unless the function is x -= 1, I suppose 🙂

  4. Reinder says:

    Stuart Langridge wrote:

    Just to be annoyingly picky, "subtraction" is a function that takes

    two values, so I’m not sure what it means to say that no value is

    idempotent under subtraction. Unless the function is x -= 1, I

    suppose 🙂

    I am one of those programmers with a mathematical background Eric is talking about, so I can be really picky, if I want to: the function "x -= 1" is idempotent for the value -inf, and probably for some NaNs, too.

  5. EricLippert says:

    I indended to say "subtracting a non-zero value" — I’ve fixed the text.

  6. I’m no a C programmer, but it looks like GetTypeLibCreator is not an idempotent function, because its input and output types are not the same, i.e. you cannot do GetTypeLibCreator(GetTypeLibCreator(x))

  7. Anonymous Coward says:

    More importantly, wouldn’t it be considered non-idempotent because of the AddRef call which changes the reference count on the returned object. Calling this function twice would not result in the same system state as calling it once…

  8. PatriotB says:

    If a function is idempotent, does that mean it’s really a dysfunction?

    Oops, missed a couple letters there…

  9. Marc Brooks says:

    Quote: The operation "subtracting any non-zero value" has no idempotent values

    Sure it does, positive and negative infinity 🙂

  10. Gabe says:

    Objective-C fixed the #include problem by creating #import. It’s exactly the same as #include except for the fact that it will include a file exactly once.

    You have to wonder why the C designers didn’t think of that.

  11. Carlos says:

    In my experience, the most common kind of idempotence in computing doesn’t fit either of the mathematical definitions. It’s more like: if you call the same function twice with the same parameters, the resultant system state is the same as if you had called it once. You could call it operational idempotence.

    Of course, it does fit the mathematical definition if you include the entire state of the computer as an implicit parameter to the function and as part of the result, but programmers don’t usually think in these terms.

  12. mschaef says:

    "Of course, it does fit the mathematical definition if you include the entire state of the computer as an implicit parameter to the function and as part of the result, but programmers don’t usually think in these terms. "

    If you have side effects in your langauge, don’t you _have_ to think that way?

  13. Carlos says:

    "If you have side effects in your langauge, don’t you _have_ to think that way?"

    Monads in Haskell make this idea explicit, but most programmers find them difficult because they don’t think like this.

    In most languages you think about things (like objects, variables, files, etc.) having things done to them. This is quite different from thinking functionally. Well, it is in my head anyway 🙂

  14. Miles Archer says:

    WHY?

  15. Jonas Grumby says:

    Idempotent (in my experience) is rarely applied to functions from R to R. It’s usually reserved for functions that operate on "more complex" domains. The place you run into it frequently is linear algebra and operator theory. So things like subspace projection operators are described as idempotent.

    Additionally, I don’t think I’ve ever heard anyone say anything like "squaring has 2 idempotent values". I would say "squaring has 2 fixed points", or something like that. Fixed points and idempotency are related; in the subspace projection example, half of the reason that it’s idempotent is that every vector in the subspace is a fixed point (the other half is that the operator maps every point into the subspace).

    The ways I’ve seen it used in CS/SE are an extension on this. They require that you to think of the program as a state machine. Some operations can be run more than once with the same affect as if you had run it once. But the concept is really only interesting if the operation has side effects; otherwise, running the operation once is also the same as never running it, from a program state POV. That is, it’s too much like the identity operator. It is good to know that some functions don’t have side effects, but I think if that’s what you mean you should just say so, and save your $5 word for a place where the function *does* have side effects, but is *still* idempotent.

  16. Dave says:

    May be pedantic (or perhaps I’m an idiot) but… Didn’t you mean #ifndef?

  17. That one guy says:

    We are supposed to be saying what think about the page, not fighting with each other here…

  18. Mike Miller says:

    I know this is an old post, but I just saw it (courtesy of Coding Horror). When you write, "The operation "subtracting any non-zero value" has no idempotent values," that’s not quite true.

    In a mathematical sense, there’s negative and positive infinity (of all Alephs). In the realm of computation, the value NaN (if it exists) is usually idempotent on subtraction (or addition) of any value.

  19. For what it’s worth, the C and C++ standards didn’t include #import or #pragma once because they’re both very hard (perhaps equivalent to the Halting Problem, on any sufficiently modern system) to get right. I’ve certainly never seen a compiler that did #pragma once correctly, once you account for hardlinks, softlinks, WinDOS slash-backslash-case-insensitivity issues, and being able to mount things in different places.

    Most compilers guess right, most of the time; but "most" isn’t a word that should appear in *any* ISO standard.

  20. Shane Tolmie says:

    The keyword "idempotent" comes up very frequently when writing interfaces for ZeroC ICE, a framework for message passing between two programs.

    The ICE docs explain it a lot better than I can, but essentially, if you mark a method as "idempotent", the message passing is much more robust, as it can retry the operation if it fails the first time.