More on 64-vs-32

OK, this isn’t a really meaty post, it’s more of a collection of a few ideas that have been rattling around in my head for a while.  I kept hoping they’d develop into something bigger, or I’d have time to research/investigate them more, but nothings happened, so I’ll just dump them as is.

So when comparing AMD64 chips to normal x86 chips, there’s two broad categories of differences:

  1. Registers and Pointers are now twice as big.
  2. Killer architecture that helps even x86 code run faster.

Most gamers out there are already aware of #2, so instead I’m going to focus on #1.  Basically I think of it this way, “I’ve already got this screaming system and I need to decide if it’s worth it to compile my code for 64-bit or leave it as 32-bit.”  There is pain involved in moving to 64-bit unless you happen to be the perfect developer that carefully uses size_t and HANDLE and int religiously.

The biggest downside I’ve run into lately (and this really isn’t new researchers ran into it almost 20 years ago on the first 64-bit RISC chips) also happens to be the biggest benefit of 64-bit: pointers are now twice as big!  If you have a classic tree structure that contains relatively trivial data, it has suddenly doubled in size.  Now if you really have a tree that holds enough data that it exceeds 2GB, then this probably is OK because you need that much address space.  Most apps don’t fall into that category (and for the rest of this post I’m going to assume that the normal 32-bit 2GB of addressable memory is sufficient).  Thus I think my first criteria is simple: are your data structures pointer-laden?  If they are, you are going to take a hit in performance from increased memory usage.

The ‘fix’ is to change your data structures to use indexes or based pointers such that only a few real 64-bit pointers exist and most of your data structures use smaller (32 or 16 bit) offsets.  This is somewhat contrary to classic x86 style because why store a 32-bit integer array index, when you could use a direct pointer instead and they both take the same size.  Now that they aren’t the same size, you need to carefully think about your pointers.  Do you really need a 64-bit pointer or can you get away with a 32-index somehow?

The other advantage is that the registers are bigger, and you get more of them.  More registers means more things can get enregistered, but only if you don’t do things like take their address, pass them by reference, or other such things.  Bigger registers mean that the few places where you actually use 64-bit integers, it is now more efficient!

So what’s the final answer?  Well my heuristic would be this: do you deal with BIG stuff?  If  yes, then try compiling 64-bit otherwise stick with 32-bit.  If performance really matters and you’re willing to spend several months re-architecting to use fewer pointers, then try 64-bit, but make sure to measure everything.


Comments (5)

  1. The pain of switching to 64-bit is greatly reduced if you’re using C# (or any pure-managed language). If there’s no unsafe code, P/Invoke, etc, it’s quite painless 🙂

    C/C++ is a different story…

  2. I ran into these same "relevations" many years ago when the DEC alpha came out, and I ported a large graphics system to 64-bit (late 80s? or early 90s?). It was not just "researchers" that were affected.

    In the comments in Rico’s blog I posted tree code using integers instead of pointers as you describe, to show that pooled structs are faster than heap objects even on a 32-bit architecture:

    I also created an example that allows the tree nodes to be incrementally freed. It was not faster than the heap and GC version. If anyone wants it, send me an email.

  3. Grant says:

    Ryan – I agree. I just haven’t seen much code that doesn’t use at least a few P/Invoke’s. Of course I also have to remind myself that I’m not the typical C# programmer…

    Frank – Thanks for the extra history. I did see your post on Rico’s blog, but forgot to link to it. Thanks for ‘fixing’ that oversight. An interesting experiment would be to compare all thee (your 2 variant’s and Rico’s GC-based one) x86-vs-Amd64 with significantly larger datasets (i.e. pushing the 2GB memory address limit). Alas if I only had more spare time…

  4. Tanveer Badar says:

    "More registers means more things can get enregistered, but only if you don’t do things like take their address, pass them by reference, or other such things."

    Not necessarily true.

    int i = 0, *j = &i;


    i can happily reside in a register if j is never used.