A little 64 bit follow-up


I recently wrote a pair of at least slightly controversial articles about 64-bit vs. 32-bit applications and the performance costs associated with going to 64 bit.

They are here and here.

Some people thought I was nuts.

Some people provided benchmarks “proving” that 64 bit code is in fact faster than 32 bit code.

I guess the lesson for me here is that I can’t say often enough that I only ever write approximately correct blog entries and YMMV should be stamped on top of everything.

But, just so you don’t think I’ve lost my mind, here is some data.

These are recent builds of IE and Edge in 32 bit and 64 bit.  Its the same code, or as close as possible to the same code.  This is just the main engine of both plus the old script engine and the new script engine.

  32 bits 64 bits Growth
edgehtml.dll 18,652,160 22,823,936 22.4%
mshtml.dll 19,322,880 24,577,536 27.2%
chakra.dll 5,664,256 7,830,016 38.2%
jscript9.dll 3,667,968 4,895,232 33.5%

 

This result is completely typical — between 1.2 and 1.4x growth is very normal for mainstream code.  This is not universal, but it’s in line with what I’ve seen elsewhere.

Yes there are exceptions.  Yes it totally depends on the codebase.

Now is this going to be slower?  Almost certainly because bigger is slower.

Why almost?

Well, if the code has really good locality (e.g. you only use a tiny slice of it;  e.g. you move from one well used slice to another well used slice)  then it might be that the parts of the code that are hot at any given moment are still fitting well into the cache.  But that’s sort of a lie as well.  You see in a real system there is pollution because device drivers are running and background processes are running and so it’s never really the case that there is surplus cache capacity.  In real systems extra cache capacity basically just creates the opportunity for your code to remain efficient in the presence of other normal workloads.  Micro-benchmarks don’t these effects.

I also said that the data is always bigger.  People disagree, but there really is no room for disagreement here.  I think we can all agree that going to 64 bits won’t make your data smaller, so the best you can hope for is a tie.  I also think you are going to have at least one pointer on the stack somewhere.   So a tie isn’t possible.  Pedantic yes, but on point.  The truth is that depending on your workload you will see varying amounts of growth and no shrinkage.  If your growth is sufficiently small, or the locality of the data is sufficiently good, then the extra cache misses from size growth will not be so bad.  Micro-benchmarks will often still fit in the cache on a 64 bit system. 

Visual Studio is in a poor position viz data growth because it has a lot of pointer rich data structures.  Microsoft Edge, Chrome, and Firefox are in relatively better positions because much of their data is not rich in pointers — bitmaps for instance are essentially (minus some metadata) identical, strings are identical, styles can be stored densely.  As a consequence browsers will suffer less than VS would.

The bottom line is that this will vary a great deal based on your workload.  But the price of cache misses should not be underestimated.  A modern processor might retire over 200 instructions in the same time as one cache miss.  More in extreme situations.  That’s a lot to recover from, extra registers won’t do it universally.

I wrote about these phenomena in greater detail in this article from 2014.  That article includes a quantitative examination of data growth and locality effects.

I had some top level observations for anyone contemplating a port:

* if you think you’re going to get free speed, you have it backwards — you start in the hole, you will have to win back your losses with processor features, definitely possible but not a given

* if you’re going to 64 bits because you’re out of memory but you could fix your memory problems with “let’s not do this stupid thing anymore” then you should do that

* if you’re avoiding 64 bits because you keep saying “I can stay in 32 bits if I do this new stupid thing” then you’re kidding yourself, crazy workarounds are doomed

Visual Studio in 2009 was definitely in the situation whereby avoiding some dumb stuff we could make it fit.  I don’t know where it is in 2016.

This also excludes collateral benefits you get from going to 64 bits, such as avoiding the wow subsystem, cleaning up an otherwise potentially clumsy architecture, greater security due to increased exploit difficulty.  Some of these benefits may be very important to your workload/customers. 

 

 


Comments (24)

  1. tony roth says:

    thanks for this,  getting tired of trying to explain this to people!

  2. Ben Craig says:

    I agree with the overall statement.  I have a minor objection to one statement though…

    "I also said that the data is always bigger.  People disagree, but there really is no room for disagreement here.  I think we can all agree that going to 64 bits won't make your data smaller, so the best you can hope for is a tie."

    Depending on a bunch of stuff, it is possible to get smaller stack usage on AMD64 compared to x86.  Microsoft is poorly positioned here because of shadow stack space, but on Linux, what would have been memory pushes and pops for the first six arguments will instead just live in registers.

    This is really minor though, as stack locality still tends to be good, and in practice, net stack usage still tends to go up because of other pointers that are hanging around.  Also, heap usage is pretty much doomed to a tie at best (as you suggest).

  3. ricom says:

    OK, I'll relent, there are some cases that are sufficiently exotic that you can actually see a savings.

  4. Billy O'Neal says:

    I bet much of the growth for the browsers comes from the calling convention difference. My (extremely limited) understanding is that the x64 calling convention works like __cdecl — the caller cleans up the stack, whereas the x86 calling convention often used, __stdcall, lets the callee clean up the stack. Since a given function usually has more than one caller, there's some duplication here that __stdcall saves.

    But I'm not an expert in this area and could be completely misunderstanding.

  5. ricom says:

    I think the x64 calling convention is actually better.  It's really about average instruction size.

  6. Cory Nelson says:

    Windows / VC++ really needs a mode where you're in 64-bit mode but your address space is 32-bit. For most apps, this would almost definitely end up more performant than either of the current 32-bit or 64-bit modes.

  7. ricom says:

    The old segmented compiler had "memory models" large, small, medium, compact to create the ability to have different combinations 16 bit code, 16 bit data (small) 32-32 for large and 32-16 and 16-32 for medium and combat respectively.  It was a bit of a nightmare requiring 4 different C runtime libraries…  I'm not sure I'm so eager to go back to that.  The cure is worse than the disease.

  8. Niklas Bäckman says:

    But, shouldn't everything move on to 64-bit at some point anyway? Or do you expect that 32-bit will be equally well supported indefinitely?

  9. ricom says:

    Yeah.  It's always a question of when.

  10. Daz says:

    The idea that because a program is larger it must be slower is really not the case.  You have to remember that modern processors take larger chunks of data at time when using 64 bit and hence can process data much quicker over all. They actually have to restrict themselves to operate in 32 bit mode. Look at programs like photoshop, and excel etc and you definitely notice the speed difference when running 64bit.  Also, unlike days of past, memory restrictions are not an issue.  Who cares if a program is 20%larger in modern systems.  

    The other point is memory and storage access which is severely limited with 32 bit.  Only the basic machines now have less than 4 GB and multitasking etc on larger memory systems again is restriced severely with 32bit.

    Going by some of the beliefs here then we should all really go back to 16 bit because that would be small, maybe even 8 bit? Small is faster….

    Personally I would like to see 32 bit phased out completely and everything run true 64 bit, especially on desktop and modern hardware.   Maybe for some IOT devices then 32 bit is fine (or 16) but for general programs then 64 bit is definitely the way to go.

    I believe is Windows was a complete 64bit system it would be 50 to 100% faster than the hybrid system it currently is. Same with any modern OS if they shifted to strictly 4bit.

  11. ricom says:

    Sadly memory doesn't work like that.  The processor always reads memory via the cache.  The cache is the same regardless of the size of the instruction set being used for any given processor.  Better instructions will mean you can get data out of the cache faster, but not read memory faster.  If you need more cache because your data got bigger you will incur more cache misses.

    Many modern programs are dominated by memory bandwidth rather than instruction execution.  The machine word size doesn't affect memory bandwidth at all.

  12. ricom says:

    I'm not sure that was very clear.  Let me try to add a bit more.  If you grow your data the problem becomes how quickly you can move the data from the main RAM into the processor cache.  That rate is not affected by the instruction set, only the amount of memory that has to be moved.  Because moving memory into the cache is so much slower than executing instructions it is frequently the dominant effect in systems that are rich in pointers.

  13. Most Interesting Man in the World says:

    I don't always write high performance code, but when I do, I write Z80 assembly.

  14. Tom says:

    Currently you may be correct, however the arguments you make are very (repeat very) reminiscent of the arguments to avoid 16 bit code during the switchover, and the arguments to avoid 32 bit code during that switchover. I am unconvinced that even in the medium term it will make any actual real world performance difference, and I expect that in the long term (10-15 years) we will be having a similar argument about 64 bit vs. 128 bit code that seems to be the pattern.

    Yes, right now Processor Cache to Memory bandwidth is a problem. 5 years ago disk to memory bandwidth was a problem, 5 years before that Memory size was a problem, 5 years before that Processor to Memory bandwidth was a problem…see a pattern?

    The short answer – analyze your problem, analyze what hardware you expect to run on, analyze what your users expect, and analyze what the lifespan of this particular iteration of the product is, use that to pick your target. 32 vs 64 bit.

  15. ricom says:

    Well actually those are still problems

  16. Fred Bosick says:

    We won't be going to 128 bits any time soon. Just look at the memory possible with 64 bit addressing.

  17. OldETC says:

    Compiling the same code with the same flags may not generate 64 bit code.  Check that your Make file or other control file actually sets up properly for 64 bits.  Also the compiler design may restrict the floating points to 32 bits which is actually slower on the 64 bit architecture.  Use doubles if in doubt.

  18. F.D.Castel says:

    One has to love when people counterarguments plain facts with "I am unconvinced", "I believe" and other gems ("50% to 100% faster"!! OMG! What are we waiting for??).

  19. Cubee says:

    What Rico is explaining here could be qualified as common sense. If you don't get that, you should not be writing code that depends on the stuff that is explained here. It is definitely NOT the same argument as the old 16 vs 32 bit. And I really can't seem to remember that argument anyway, everybody was happy with that particular improvement. But since 16 bit was mentioned. Yes, a random program that would fit completely in 16 bit memory space and could use the same addressing mechanism as the current 32 bit code – so not the old segmented mechanism – would on average be faster than its 32 bit counterpart.

    There is no black magic inside a CPU. Stuff occupying space is slower to move than stuff occupying less space, and it still occupies the same amount of space after it has been moved.

    "Pointers, messing with peoples minds since…well ever"

    As a side note, so you know where I am coming from, the application that I am working on benefits greatly from 64 bit addressing since we actually have data structures that need to be contiguous in memory and are super large. That is what you get when you have old – but tried and tested – number crunching algorithms being used in the 21st century on models 100 times the size of the old ones.

  20. Ron says:

    This all sounds familiar. "640K ought to be enough for anybody."

  21. ricom says:

    Well, no actually, it's nothing like "640k ought to be enough for anyone".  There are tons of workloads that cannot avoid having buckets of data resident.  There are also plenty of programs that load way more data than they need to load and if any care was used at all they would be very fast.  Allowing them to bloat even more, so they can run more slowly than ever, isn't a better answer than "fix your ***."

    Maybe I should have said, as was once told to me years ago, "Cache is the new RAM;  RAM is the new disk.  Stop using RAM stupidly, you never would have done this with disk."

  22. JohnM says:

    Alexandrescu claims 64 bit code with 32 bit data (where possible) will be faster, largely because 64 bit compiler optimizations beat out 32 bit optimizations, simply because of the work being put into them. I've no idea whether that's true of not, but I doubt he's making it up.

    http://www.youtube.com/watch

  23. Dirk says:

    Your posts are truly a very good discussion of the pros and cons of 32-bit vs 64-bit code. Besides performance there is one very important aspect that I didn't find in your articles: security.

    Only with a 64-bit process you can actually benefit from (high-entropy) ASLR. Within a 32-bit process ASLR can be defeated by brute-force in very short time.

  24. ricom says:

    @Dirk: I did mention security in there somewhere in one of them.  But you're exactly right, ASLR is a key benefit.  There are others — such are no WOW.

Skip to main content