What assembly language DOES your code generate?

Pat Niemeyer had a fascinating comment in my article about programmers knowing roughly what assembly language their code generates:

Your example serves to demonstrate that developers should, under normal circumstances, *not* care about low level code or performance issues... They should just do the "right thing" where that is usually the simplest and most staightforward thing. If developers had followed this rule in Java all along then they woudln't have been bitten by all these stupid optimization rules that ended up being anti-optimizations when the VMs got smarter.

Trust the compiler people... trust the VM people and then when your code is working trust the profilers to tell you what is actually going on. People's intuition about performance issues is wrong more often thant it's right. You don't know anything unless you profile it - zip, nada, nothing. You may think you're a hotshot Java programmer (we all do) but you're wrong most of what you think about optimization. That's the real rule.

At some level, Pat’s absolutely right – the VM and compiler people know far better than you do how to optimize your code.  But please note my use of the phrase “Roughly” – You don’t need to know the exact instructions that are generated, actually, as I mentioned above Pat’s comment – in the face of modern optimizing compilers, it’s actually a bad idea to write your code in assembly unless you REALLY know what you’re doing (I’d be willing to trust someone like Mike Abrash or maybe a couple of others to write assembly language code for modern processors, but I certainly wouldn’t attempt to do it myself, even though I spent the first 10 years of my career at Microsoft programming exclusively in x86 assembly language) .

When I started writing x86 assembly language, the rule of thumb for performance was pretty simple:  As long as you don’t use multiply or divide, the smaller your code, the faster it is.  That was all you needed to know.  With the Pentium, this changed for Intel processors (it had always been the case for RISC processors).  The infamous Appendix H described in full how to write optimal code for the Pentium series processors.

All of a sudden, it became quite difficult to actually write efficient code.  And for every release of Intel’s processors, it’s become harder and harder to write the fastest code possible in assembly – typically compilers can do a much better job of it than mere mortals.

But I still stand by my statement – I disagree with Pat, especially if you’re writing systems code.  If you’re introducing several thousand extra instructions inside your POP3 server because you chose to do error processing by throwing exceptions (no, I didn’t do that in the Exchange POP3 server), or if your I/O subsystem is twice as large as it should be because you used macros for critical sections, you in fact be using the most “straightforward thing”.  But unless you know what the performance implications of doing the “most straightforward thing” are, it’s highly likely that doing the “straightforward thing” is the thing that stops your POP3 server from being able to support 50,000 users on a single Exchange server  Or it’ll be the difference between serving 100 million hits a day and 50 million hits a day.


Comments (9)

  1. Anonymous says:

    Profiling before optimization is an excellent idea, but honestly, the rules for optimizing modern x86 assembly are not much different than optimizing modern C++ — good algorithms and streamlined data paths. Fast asm code tends to have a lot of arcane tricks in it but you can do most of those tricks in a high-level language too. Along those lines, bad assembly tends to be slow because it uses lousy algorithms, not because of assembly-specific issues.

    Performance is not the only problem that arises with not knowing assembly, however. Programmers that do not know assembly also basically cannot debug an optimized build, because they throw their hands up in the air when the call stack is wrong, local variables don’t show up, or they only have a post-mortem report to work from. Some might say that .NET frees you from having to deal with such low-level issues, but anyone who has read Chris Brumme’s weblog knows there are plenty of .NET-specific gotchas, like finalization ordering issues.

    Just because you don’t _need_ to know how everything works doesn’t mean that knowledge isn’t useful.

  2. Anonymous says:

    I absolutely agree with you about programmers needing to know assembly language – I can’t think of the number of times I’ve had to bail out other developers because they couldn’t post-mortem a crash because the optimizer had morphed their code into a form that was essentially unrecognisable.

  3. Anonymous says:

    I recently debugged a problem for a poster on CodeProject’s Visual C++ forum who was having trouble with using the << operator to output a CString object to an ofstream. Under Visual C++ 6, this outputs the address of the underlying string, not the contents.

    By being able to interpret the assembly generated by the compiler, I could see that for some reason it had selected the member basic_ostream<>::operator<<(const void*) rather than the global function operator<<( const basic_ostream<>&, const char*). undname is your friend in a C++ disassembly! It appears to be a compiler bug because simply adding an (LPCTSTR) cast, rather than relying on an implicit call of the conversion operator, fixes the problem.

    Knowing your programming language and environment is probably more important than knowing assembly, in general.

    I’ve also done something more dubious. A Pocket PC OEM provides an API for their device to produce tones from a built-in beeper (rather than the wave audio device). Unfortunately the implementation suffers from a race condition which sometimes causes a different tone from that requested to be generated (an ear-piercing shrill). Since I know ARM assembly fairly well, I was able to reverse-engineer it to the point that I could bypass their API and call the driver directly.

    Knowing assembly is even more important on Pocket PCs because there are no OS symbols, and the return address isn’t stored in a consistent position on the stack on the ARM architecture (the symbols tell the debugger where to find it). If you get an exception in a system function, you have to read the code to find where it stored the return address in order to walk back to your code.

  4. Anonymous says:

    Actually Mike, that ARM restriction is for most RISC architectures. I had the same issues with MIPS and Alpha machines back in the day.

    Btw, the VC6 behavior isn’t a compiler but as far as I know, I think it has to do with the fact that a CString isn’t a const char * – you need to explicitly tell the compiler to use the const char * overload on CString – because there’s a built-in conversion to void *.

  5. Anonymous says:

    You’ll also have the restriction on Itanium processors, whose architecture is very unfamiliar to those used to the x86.

    The disassembly told me something different: it was calling CString::operator char const* (or operator LPCTSTR to you and me) to get the pointer. There is no conversion to void* that I can see.

    VS.NET in its default configuration does the right thing, but whether that’s due to the massive changes in the compiler (which are extensive), or that MFC 7’s CString is a totally new implementation, I don’t know.

  6. Anonymous says:

    Hmm. I’m not sure – I believe that there’s an implicit conversion to void * for every object in C++.

    Also, the last time I looked, I saw the same CString issue with VS.NET 2K3, but it may be that the NT tree isn’t running the most recent ATLMFC bits (don’t ask, it’s a LONG story).

  7. Anonymous says:

    I never understood why people "do error processing by throwing exceptions". 99.99% of time, returning an error code is simpler and faster.

  8. Anonymous says:

    B.Y., error handling via exceptions is a whole ‘nother ball of wax – Raymond, Mike Grier, and I have all touched on it over various times.

    Bottom line: None of us understand it either 🙂

Skip to main content