A little 64 bit follow-up


I recently wrote a pair of at least slightly controversial articles about 64-bit vs. 32-bit applications and the performance costs associated with going to 64 bit.

They are here and here.

Some people thought I was nuts.

Some people provided benchmarks "proving" that 64 bit code is in fact faster than 32 bit code.

I guess the lesson for me here is that I can't say often enough that I only ever write approximately correct blog entries and YMMV should be stamped on top of everything.

But, just so you don't think I've lost my mind, here is some data.

These are recent builds of IE and Edge in 32 bit and 64 bit.  Its the same code, or as close as possible to the same code.  This is just the main engine of both plus the old script engine and the new script engine.

  32 bits 64 bits Growth
edgehtml.dll 18,652,160 22,823,936 22.4%
mshtml.dll 19,322,880 24,577,536 27.2%
chakra.dll 5,664,256 7,830,016 38.2%
jscript9.dll 3,667,968 4,895,232 33.5%

 

This result is completely typical -- between 1.2 and 1.4x growth is very normal for mainstream code.  This is not universal, but it's in line with what I've seen elsewhere.

Yes there are exceptions.  Yes it totally depends on the codebase.

Now is this going to be slower?  Almost certainly because bigger is slower.

Why almost?

Well, if the code has really good locality (e.g. you only use a tiny slice of it;  e.g. you move from one well used slice to another well used slice)  then it might be that the parts of the code that are hot at any given moment are still fitting well into the cache.  But that's sort of a lie as well.  You see in a real system there is pollution because device drivers are running and background processes are running and so it's never really the case that there is surplus cache capacity.  In real systems extra cache capacity basically just creates the opportunity for your code to remain efficient in the presence of other normal workloads.  Micro-benchmarks don't these effects.

I also said that the data is always bigger.  People disagree, but there really is no room for disagreement here.  I think we can all agree that going to 64 bits won't make your data smaller, so the best you can hope for is a tie.  I also think you are going to have at least one pointer on the stack somewhere.   So a tie isn't possible.  Pedantic yes, but on point.  The truth is that depending on your workload you will see varying amounts of growth and no shrinkage.  If your growth is sufficiently small, or the locality of the data is sufficiently good, then the extra cache misses from size growth will not be so bad.  Micro-benchmarks will often still fit in the cache on a 64 bit system. 

Visual Studio is in a poor position viz data growth because it has a lot of pointer rich data structures.  Microsoft Edge, Chrome, and Firefox are in relatively better positions because much of their data is not rich in pointers -- bitmaps for instance are essentially (minus some metadata) identical, strings are identical, styles can be stored densely.  As a consequence browsers will suffer less than VS would.

The bottom line is that this will vary a great deal based on your workload.  But the price of cache misses should not be underestimated.  A modern processor might retire over 200 instructions in the same time as one cache miss.  More in extreme situations.  That's a lot to recover from, extra registers won't do it universally.

I wrote about these phenomena in greater detail in this article from 2014.  That article includes a quantitative examination of data growth and locality effects.

I had some top level observations for anyone contemplating a port:

* if you think you're going to get free speed, you have it backwards -- you start in the hole, you will have to win back your losses with processor features, definitely possible but not a given

* if you're going to 64 bits because you're out of memory but you could fix your memory problems with "let's not do this stupid thing anymore" then you should do that

* if you're avoiding 64 bits because you keep saying "I can stay in 32 bits if I do this new stupid thing" then you're kidding yourself, crazy workarounds are doomed

Visual Studio in 2009 was definitely in the situation whereby avoiding some dumb stuff we could make it fit.  I don't know where it is in 2016.

This also excludes collateral benefits you get from going to 64 bits, such as avoiding the wow subsystem, cleaning up an otherwise potentially clumsy architecture, greater security due to increased exploit difficulty.  Some of these benefits may be very important to your workload/customers. 

 

 


Comments (29)

  1. tony roth says:

    thanks for this,  getting tired of trying to explain this to people!

  2. Ben Craig says:

    I agree with the overall statement.  I have a minor objection to one statement though…

    "I also said that the data is always bigger.  People disagree, but there really is no room for disagreement here.  I think we can all agree that going to 64 bits won't make your data smaller, so the best you can hope for is a tie."

    Depending on a bunch of stuff, it is possible to get smaller stack usage on AMD64 compared to x86.  Microsoft is poorly positioned here because of shadow stack space, but on Linux, what would have been memory pushes and pops for the first six arguments will instead just live in registers.

    This is really minor though, as stack locality still tends to be good, and in practice, net stack usage still tends to go up because of other pointers that are hanging around.  Also, heap usage is pretty much doomed to a tie at best (as you suggest).

  3. ricom says:

    OK, I'll relent, there are some cases that are sufficiently exotic that you can actually see a savings.

  4. Billy O'Neal says:

    I bet much of the growth for the browsers comes from the calling convention difference. My (extremely limited) understanding is that the x64 calling convention works like __cdecl — the caller cleans up the stack, whereas the x86 calling convention often used, __stdcall, lets the callee clean up the stack. Since a given function usually has more than one caller, there's some duplication here that __stdcall saves.

    But I'm not an expert in this area and could be completely misunderstanding.

  5. ricom says:

    I think the x64 calling convention is actually better.  It's really about average instruction size.

  6. Cory Nelson says:

    Windows / VC++ really needs a mode where you're in 64-bit mode but your address space is 32-bit. For most apps, this would almost definitely end up more performant than either of the current 32-bit or 64-bit modes.

  7. ricom says:

    The old segmented compiler had "memory models" large, small, medium, compact to create the ability to have different combinations 16 bit code, 16 bit data (small) 32-32 for large and 32-16 and 16-32 for medium and combat respectively.  It was a bit of a nightmare requiring 4 different C runtime libraries…  I'm not sure I'm so eager to go back to that.  The cure is worse than the disease.

  8. Niklas Bäckman says:

    But, shouldn't everything move on to 64-bit at some point anyway? Or do you expect that 32-bit will be equally well supported indefinitely?

  9. ricom says:

    Yeah.  It's always a question of when.

  10. Daz says:

    The idea that because a program is larger it must be slower is really not the case.  You have to remember that modern processors take larger chunks of data at time when using 64 bit and hence can process data much quicker over all. They actually have to restrict themselves to operate in 32 bit mode. Look at programs like photoshop, and excel etc and you definitely notice the speed difference when running 64bit.  Also, unlike days of past, memory restrictions are not an issue.  Who cares if a program is 20%larger in modern systems.  

    The other point is memory and storage access which is severely limited with 32 bit.  Only the basic machines now have less than 4 GB and multitasking etc on larger memory systems again is restriced severely with 32bit.

    Going by some of the beliefs here then we should all really go back to 16 bit because that would be small, maybe even 8 bit? Small is faster….

    Personally I would like to see 32 bit phased out completely and everything run true 64 bit, especially on desktop and modern hardware.   Maybe for some IOT devices then 32 bit is fine (or 16) but for general programs then 64 bit is definitely the way to go.

    I believe is Windows was a complete 64bit system it would be 50 to 100% faster than the hybrid system it currently is. Same with any modern OS if they shifted to strictly 4bit.

  11. ricom says:

    Sadly memory doesn't work like that.  The processor always reads memory via the cache.  The cache is the same regardless of the size of the instruction set being used for any given processor.  Better instructions will mean you can get data out of the cache faster, but not read memory faster.  If you need more cache because your data got bigger you will incur more cache misses.

    Many modern programs are dominated by memory bandwidth rather than instruction execution.  The machine word size doesn't affect memory bandwidth at all.

  12. ricom says:

    I'm not sure that was very clear.  Let me try to add a bit more.  If you grow your data the problem becomes how quickly you can move the data from the main RAM into the processor cache.  That rate is not affected by the instruction set, only the amount of memory that has to be moved.  Because moving memory into the cache is so much slower than executing instructions it is frequently the dominant effect in systems that are rich in pointers.

  13. Most Interesting Man in the World says:

    I don't always write high performance code, but when I do, I write Z80 assembly.

  14. Tom says:

    Currently you may be correct, however the arguments you make are very (repeat very) reminiscent of the arguments to avoid 16 bit code during the switchover, and the arguments to avoid 32 bit code during that switchover. I am unconvinced that even in the medium term it will make any actual real world performance difference, and I expect that in the long term (10-15 years) we will be having a similar argument about 64 bit vs. 128 bit code that seems to be the pattern.

    Yes, right now Processor Cache to Memory bandwidth is a problem. 5 years ago disk to memory bandwidth was a problem, 5 years before that Memory size was a problem, 5 years before that Processor to Memory bandwidth was a problem…see a pattern?

    The short answer – analyze your problem, analyze what hardware you expect to run on, analyze what your users expect, and analyze what the lifespan of this particular iteration of the product is, use that to pick your target. 32 vs 64 bit.

  15. ricom says:

    Well actually those are still problems

  16. Fred Bosick says:

    We won't be going to 128 bits any time soon. Just look at the memory possible with 64 bit addressing.

  17. OldETC says:

    Compiling the same code with the same flags may not generate 64 bit code.  Check that your Make file or other control file actually sets up properly for 64 bits.  Also the compiler design may restrict the floating points to 32 bits which is actually slower on the 64 bit architecture.  Use doubles if in doubt.

  18. F.D.Castel says:

    One has to love when people counterarguments plain facts with "I am unconvinced", "I believe" and other gems ("50% to 100% faster"!! OMG! What are we waiting for??).

  19. Cubee says:

    What Rico is explaining here could be qualified as common sense. If you don't get that, you should not be writing code that depends on the stuff that is explained here. It is definitely NOT the same argument as the old 16 vs 32 bit. And I really can't seem to remember that argument anyway, everybody was happy with that particular improvement. But since 16 bit was mentioned. Yes, a random program that would fit completely in 16 bit memory space and could use the same addressing mechanism as the current 32 bit code – so not the old segmented mechanism – would on average be faster than its 32 bit counterpart.

    There is no black magic inside a CPU. Stuff occupying space is slower to move than stuff occupying less space, and it still occupies the same amount of space after it has been moved.

    "Pointers, messing with peoples minds since…well ever"

    As a side note, so you know where I am coming from, the application that I am working on benefits greatly from 64 bit addressing since we actually have data structures that need to be contiguous in memory and are super large. That is what you get when you have old – but tried and tested – number crunching algorithms being used in the 21st century on models 100 times the size of the old ones.

  20. Ron says:

    This all sounds familiar. "640K ought to be enough for anybody."

  21. ricom says:

    Well, no actually, it's nothing like "640k ought to be enough for anyone".  There are tons of workloads that cannot avoid having buckets of data resident.  There are also plenty of programs that load way more data than they need to load and if any care was used at all they would be very fast.  Allowing them to bloat even more, so they can run more slowly than ever, isn't a better answer than "fix your ***."

    Maybe I should have said, as was once told to me years ago, "Cache is the new RAM;  RAM is the new disk.  Stop using RAM stupidly, you never would have done this with disk."

  22. JohnM says:

    Alexandrescu claims 64 bit code with 32 bit data (where possible) will be faster, largely because 64 bit compiler optimizations beat out 32 bit optimizations, simply because of the work being put into them. I've no idea whether that's true of not, but I doubt he's making it up.

    http://www.youtube.com/watch

  23. Dirk says:

    Your posts are truly a very good discussion of the pros and cons of 32-bit vs 64-bit code. Besides performance there is one very important aspect that I didn't find in your articles: security.

    Only with a 64-bit process you can actually benefit from (high-entropy) ASLR. Within a 32-bit process ASLR can be defeated by brute-force in very short time.

  24. ricom says:

    @Dirk: I did mention security in there somewhere in one of them.  But you're exactly right, ASLR is a key benefit.  There are others — such are no WOW.

    1. Pete Wilson says:

      While this post is a year old, it seems like ASLR isn’t that useful today.

  25. Dirk Bester says:

    Seriously guy, who gives a damn about all this bs rationalization.

    Just read this acronym: ASLR

    Now imagine that customers don’t care if the code is whatever size and runs 20% slower or whatever. Imagine that the only thing that matters is security. It is not just that you guys are too goddamn lazy to switch to 64 bit two decades later. It is that you inevitably force everyone to release 32 bit software because that is the only pathway that for sure works. Any 64bit in your code and suddenly you need a $5k license from Installshield just to package it or you have to spend $20k of your time figuring out some other free way that is totally shitty in comparison.

    Now imagine that I worked on a 4.5 million line codebase. Do the symbols fit into your tiny shitty slice of my dev machines now 64GB of memory? They do? What if I add resharper? Does it still fit? If you think they do, what does it mean: “ran out of memory processing symbols”? Do you have any idea how much donkey balls your IDE sucks because of its lack of real 64 bit memory support?

    Now imagine that I also program on a Mac. Oh wait, that has been 64bit since one year after the available hardware and 32 bit got abandoned before the dawn of recorded history. Zero memory problems.

    Meanwhile because MS loves to make life a complicated fustercluck there is still a 32 bit version of Windows goddamn 10. Yeah! All my code has to be both 32 and 64 bit. Fan fucking tastic. Hurray! Nothing can go wrong from such needless complication! The KISS principle? That’s Gene Simmon’s philosophy of music right? Not something to program by.

    This rant can go on forever, because all this 32 bit shittiness leads to shitty 32 bit code still being made and running on my OS that has been 64 bit for many versions now and causing it to be slow because now the OS has to have shitty 32 bit code as well so it can work with those shitty 32 bit programs. You claim this is not a shitty situation but you, MS, have literally released shitty Visual Studio software that is broken specifically because you yourself suck at handling the complications of compiling and assembling 32 and 64 bit code. Why? Because simply having both means that your test guys have to now cover 32/64 and 32and64 and then Intel and AMD of those three and never mind the non x86. Just imagine how much programmer time is lost each year as people stare at two pages of shitty binaries they need to download but god only knows which is just right for their project and goddamn you if you misclick or someone chooses poorly. Imagine how much faster microsoft and all its dev ecosystem can deliver code if you would just move to 64bit PERIOD. I know I have personally wasted at least four months on shitty 32 vs 64 bit issues.

    Just admit that you are wrong. Just publicly state that you suck at modern coding. The world will applaud your refreshing honesty. They did for the Google engineer that outed the shitty cable manufacturers that were frying hardware with their non spec shitty fake USB cables. They did for the Google engineer that outed the Nest as a piece of non functional crap. I can attest that it used to heat and cool until some update made it stop doing anything unless it sees you every hour like you are some kind of monkey in a performance art piece. Now it is your turn. Stop trying to spin lipstick onto this pig. Stop making your devs be 32 bit monkeys. Eventually people move to other ecosystems and never look back. Then you lose your job and your kids have to go to shitty public schools and the whole shitty mess lands on your head. Get your shit straight man, before it is too late.

    Anyway, that’s my first world nerd rage for the month.

    1. Dirk Bester says:

      I just cannot let this go. By keeping Visual Studio a 32 bit product you are specifically not eating your own dog shit. Er I mean dog food. You write 32 bit code and then make your 32 bit product using your 32 bit code. Later your browser guys have to keep making a 32 bit browser because nobody makes 64 bit plugins. Office at least makes a 64 bit version but nobody can use it because no extensions are 64 bit once more. Unless you have a big Excel spreadsheet and the good luck buddy, I hope you like 64 bit Excel without any plugins whatsoever.

      It’s the 32 bit circle of life. Everything is 32 bit because Visual Studio is 32 bit. Visual Studio is 32 bit because statistically everything is 32 bit so clearly devs love 32 bit.

      Won’t you break the cycle of incompetence?

  26. Tom Kerrigan says:

    Why are your 64-bit binaries bigger, and why do you consider that “typical”?

    AMD64 instruction lengths are basically the same as x86 instruction lengths. Plus, AMD64 has more registers, which means less register-juggling, which means fewer instructions. So I would expect binaries to be very close to the same size, and if anything, there’s a reason to expect them to be a little bit smaller.

    I just compared 32-bit vs. 64-bit sizes of a small project I work on, and the 64-bit version is 3% smaller. (And 4% faster.) This is using the GNU toolchain on OS X.

    I also just checked the sizes of the 32-bit and 64-bit versions of Firefox. The 64-bit Windows installer is 5% bigger and the 64-bit Linux installer is 1% smaller.

    If your binaries are growing by 20% to 40%, I would suggest there’s a bug in your toolchain that you should probably sort out ASAP.

  27. Reelix says:

    Care to speculate as to why MS Paint, Notepad, and Calculator are 64-bit apps in a default installation of Windows 10 x64?

    We hear the same anti-x64 reasons over and over again (Commonly – “There’s no need” and “It’s slower”) – So – Why did the most common apps in existence make the migration whilst the apps people would think would (Developer apps / Open-Source apps) did not?

Skip to main content