Larry’s rules of software engineering, part 1: Every software engineer should know roughly what assembly language their code generates.

The first in an ongoing series (in other words, as soon as I figure out what more rules are, I’ll write more articles in the series).

This post was inspired by a comment in Raymond’s blog where a person asked “You mean you think I’m expected to know assembly language to do my job?  Yech”.

My answer to that poster was basically “Well, yes, I do expect that everyone know assembly language”.  If you don’t, you don’t really understand what your code is doing.

Here’s a simple quiz:  How many string objects are created in the following code?

int __cdecl main(int argc, char *argv[])


      std::string foo, bar, baz


      foo = bar + baz + “abc”;


The answer?  5.  Three of the strings are obvious – foo, bar, and baz.  The other two are hidden in the expression: foo = bar  + baz + “abc”.

The first of the hidden two is the temporary string object that’s created to encapsulate the “abc” string.  The second is one that’s used to hold the intermediate result of baz + “abc” which is then added to bar to get the resulting foo.  That one line of code generated 188 bytes of code.  Now that’s not a whole lot of code today, but it can add up.

I ran into this rule a long, long time ago, back in the DOS 4 days.  I was working on the DOS 4 BIOS, and one of the developers who was working on the BIOS before me had defined a couple of REALLY useful macros to manage critical sections.  You could say ENTER_CRITICAL_SECTION(criticalsectionvariable) and LEAVE_CRITICAL_SECTION(criticalsectionvariable) and it would do just what you wanted.

At one point, Gordon Letwin became concerned about the size of the BIOS, it was like 20K and he didn’t understand why it would be so large.  So he started looking.  And he noticed these two macros.  What wasn’t obvious from the macro usage was that each of those macros generated about 20 or 30 bytes of code.  He changed the macros from inline functions to out-of-line functions and saved something like 4K of code.  When you’re running on DOS, this was a HUGE savings (full disclosure – the DOS 4 BIOS was written in assembly language, so clearly I knew what the assembly language that I generated.  But I didn’t know the assembly language the macro generated).

Nowadays, memory pressures aren’t as critical, but it’s STILL critical that you know what your code is going to generate.  This is especially true if you’re using C++, since it’s entirely possible to hide huge amounts of object code in a very small amount of source.  For instance, if you have:

CComPtr<IXmlDOMDocument> document;

CComPtr<IXMLDOMNode> node;

CComPtr<IXMLDOMElement> element;

CComPtr<IXMLDOMValue> value;


How many discrete implementations of CComPtr do you have in your application?  Well, the answer is that you’ve got 4 different implementations – and all the code associated with CComPtr gets duplicated FOUR times in your application.  Now it turns out that the linker has some tricks that it can use to collapse identical implementations of methods (and it uses them starting with VC.Net), but if your code is targeting VC6, or if it’s using some other C++ compiler, you can’t guarantee that you won’t be staring at <n> different implementations of CComPtr in your object code.  CComPtr is especially horrible in this respect, since you typically need to use a LOT of interfaces in your application.  As I said, with VC.Net onwards, this isn’t a problem, the compiler/linker collapses all those implementations into a single instance in your binary, but for many templates, this doesn’t work.  Consider, for example std::vector.

std::vector<short> document;

std::vector<int> node;

std::vector<float> element;

std::vector<bool> value;

This requires that there be four separate implementations of std::vector compiled in with your application, since there’s no way of sharing the implementation between them (since the sizes of all the types are different, and thus the assembly language for the different implementations is different).  If you don’t know this is going to happen, you’re going to be really upset when your boss starts complaining about the working set of your application.

The other time that not knowing what’s going on under the covers hits you is when a class author accidentally hides performance problems in their class. 

This kind of problem happens a LOT.  I recently inherited a class that used operator overloading extensively.  I started using the code, and as I usually do, I started stepping though the code (to make sure that my code worked) and realized that the class implementation was calling the copy constructor for the class extensively.  Basically it wasn’t possible to use the class at all without a half a dozen trips through the heap allocator.  But I (as the consumer of the class) didn’t realize that – I didn’t realize that a simple assignment statement involved two trips through the heap manager, several calls to printf, and a string parse.  The author of the class didn’t know this either, it was a total surprise when I pointed it out to him, since the calls were side effects of other calls he made).  But if that class had been used in a performance critical situation, we’d have been sunk.  In this case, the class worked as designed; it was just much less efficient than it had to be.

As it is, because I stepped through the assembly, and looked at ALL the code that was generated, we were able to fix the class ahead of time to make it much more implementation friendly.  But if we’d blindly assumed that since the code functioned correctly (and it did), we’d have never noticed this potential performance problem.

If the developer involved had realized what was happening with his class, he’d have never written it that way, but because he didn’t follow Larry’s rule #1, he got burned.


Comments (20)
  1. Anonymous says:

    That’s all nice and good if you work as contractor – you get paid by the hour. I personally think that simple, readable (and therefore managable and less prone to bugs) code is more important than code that saves 20 bytes of memory and runs two milliseconds faster. Hardware is cheap compared to programmers, even those in India. But if your team writes code that’s full of bugs because they’re trying to play tricks with the compiler to make the code run a little bit faster and use a little less memory and then spends weeks trying to find those buggers you’re just wasting money and time.

    Remember people who were smart and used the top bit in memory pointers to hold their flags? That saved some memory but created code that needs to be completely re-written (to run in an environment that ca address more than 2GB of RAM). And there’s tons of other examples…

  2. Anonymous says:

    The problem with "knowing" what the assembly looks like is that your knowledge can get out of date.

    Lots of Java programmers "know" that they shouldn’t do String concatenation by writing x = a + b + c + d + e;, for instance — that allocates all sorts of extra strings (as in your example above) and is just terribly inefficient.

    Only, it doesn’t: The compiler actually translates those concatenation into StringBuffer calls invisibly, and it all works just performantly delicious.

    So for the past umpty-ump years, I’ve been dealing with code that’s needlessly full of StringBuffer.append() operations, when the programmers could have used simple straightforward concatenation, because they "knew" the assembly that was generated, based on their experience with, I dunno, Java 1.1 or something.

  3. Anonymous says:


    I’d almost want to agree with you. But I can’t think of the number of times that I’ve been called in to debug someone elses performance problem because they didn’t understand why their programming practices generated bad code. STL generates pretty good code (especially in VC.Net) but you still need to know what’s going on under the covers.

    It’s only tricks with the compiler if you’re being tricky using the compiler – and I include using templates or operator overloads as being tricky (more on that at a later date) 🙂

    Mike, your point is valid, and it works IFF you have runtime support underneath. You will note that I didn’t comment on the behavior of either managed code or Java, because there are other tricks that happen with those. The examples I used were all built with VC.NET 2K3, using the STL libraries that come with the compiler (I built a bunch of toy apps to verify the stuff I wrote above, just to make sure it still happened with the current compiler).

  4. Anonymous says:

    I kind of agree with Mike, except I think people should avoid C and C++ as much as possible. A very high percentage of those who write code are frankly bunglers (at least when it comes to handling pointers and memory), and _everybody_ makes mistakes.

    Languages like C#, Java and Python allows us to write code with less faults (and as "someone else" deals with most of the hard and boring stuff, the programs are often faster than they would’ve been had we coded them ourselves).

    Those languages, I could probably follow, but my fiancé could not, and she’s still a competent coder in those languages, but then, we regulary use languages like Scheme, SML and Prolog, and those would probably be a hard to follow for me.

  5. Anonymous says:

    Larry, if somebody doesn’t understand that their innocent line of code is going to create three unnecessary objects then the problem isn’t knowing what assembly code gets generated. It’s not know how the language you use works. If you knew C++ and how it works with objects you’d know when extra objects are created, without knowing what the actual assembly code is. I think you wanted to say – do not write code unless you know what you’re doing. Unfortunatelly not many people do…

    And I agree that operator overloads are playing tricks, most of the time they make the code very difficult to understand (and therefore maintain).

  6. Anonymous says:

    whats the best way to start learning assembly language – books? can you recommend any.



  7. Anonymous says:

    An interesting question bg. I’m actually not sure I know of any that are still in print, but the good news is that I don’t think it really matters that much.

    Jerry’s comment above is actually spot-on. It’s not so much the actual instructions (although they DO matter at some point), it’s what code is being generated.

    One thing that I might suggest doing is to use the compiler to generate a .COD file of your source code, then look up the instructions in the Intel processor reference (

    The bottom line is that I’m not sure that it’s necessary to know how to PROGRAM in assembly language any more (and I could put forward a very strong argument that it’s a bad idea to program in assembly language at this point), but it IS important that you know how to READ assembly language.

  8. Anonymous says:

    The problem with saying that hardware is cheap compared to programmers is that hardware costs scale linearly with your user base, while programmer costs don’t. If you’ve got a sufficiently large user base then you’d have to have some pretty expensive programmers for this to be true.

  9. Anonymous says:

    I still claim that this comes down to two kinds of programmers. "Systems programmers" who have to know what the heck they’re doing and "applications programmers".

    "Applications programmers" are the ones with the massive application backlog and need to work through it. Frankly they don’t have a real quality or performance problem for the most part because the ones who have a clue leave the heavy lifting to the "systems programmers" who are writing things like Exchange and SQL Server.

    "Systems programmers" are building the components that everyone else leverages and they can’t afford to play fast and loose.

    If the top-level application chooses to write their code more quickly and sloppily and generates 20,000 extra copies of strings but the job still gets done in time, that’s great.

    If the component provider writes their code quickly and sloppily and generates 20,000 extra copies of strings, this is irresponsible and hopefully they will not remain being a shared component provider for long.

    Shared component providers also have a massive backlog, but the absolute last thing in the world that we need for shared component providers is the ability for them to turn out more shared components more quickly. Quality stinks and it’s not about buffer overruns and leak management – anyone who has problems designing for and addressing these issues isn’t qualified to write a shared component in the first place.

  10. Anonymous says:

    Why did you make the rule exclusive for assembly language? I would extend it to: Every engineer should understand at least the first layer below the one he’s using. For example .NET programmers should understand the internals of CLR.

    I dare to say that the quality of an engineer may be measured by the number of layers below the one he’s working in, he trully understands.

  11. Anonymous says:

    The point re replacing a+b+c with stringbuffer is only valid in a single expression. If you are using (for example) a loop, then the compiler will *not* generate a stringbuffer for concatenation. Even different expressions using the same string aren’t guaranteed to use an SB.

    What really gets on my tits is people who are using some kind of output stream (e.g. to an ServletResponse’s writer) and do:

    write(a+b+c); // or println(a+b+c)

    when you can get concatenation for free using




  12. Anonymous says:

    Your example serves to demonstrate that developers should, under normal circumstances, *not* care about low level code or performance issues… They should just do the "right thing" where that is usually the simplest and most staightforward thing. If developers had followed this rule in Java all along then they woudln’t have been bitten by all these stupid optimization rules that ended up being anti-optimizations when the VMs got smarter.

    Trust the compiler people… trust the VM people and then when your code is working trust the profilers to tell you what is actually going on. People’s intuition about performance issues is wrong more often thant it’s right. You don’t know anything unless you profile it – zip, nada, nothing. You may think you’re a hotshot Java programmer (we all do) but you’re wrong most of what you think about optimization. That’s the real rule.

    Pat Niemeyer

    Author of Learning Java, O’Reilly & Associates

  13. Anonymous says:

    Pat Neimeyer: perhaps you’ve just proven Larrys point even better than he thought. You’re stressing the importance of understanding optimizations, which is conceptually similar to understanding the assembly language (which itself is conceptually saying "Understand how your code works")

    So many times I see code written without regard to "how it works" or even "how does this get optimized." To be honest, I see a lot of cut and paste, which is a prime reason why samples should illustrate good practices.

  14. Anonymous says:

    In the base note:

    > foo = bar + baz + “abc”;


    > The second is one that’s used to hold the

    > intermediate result of baz + “abc” which

    > is then added to bar to get the resulting

    > foo.

    OK, string concatenation is transitive, and a compiler might make use of this knowledge in performing some optimization. But I don’t think a programmer should depend on observing that one version of some optimizing compiler used some trick once on one version of the programmer’s work. One maintenance programmer later, the results might be different.

    In this example, the programmer should start by assuming that bar + baz will generate a temporary, and the temporary "should" reuse itself in adding "abc" to the result. If the generated code is worse than this, complain to the makers of the compiler or the component, AND if performance is bad then temporarily find a workaround. If the generated code is better than this, be happy but do not depend on it.

    (By the way, if these were numbers instead of strings, and subtraction operators instead of addition, I hope your compiler gets the sequence of operations correct.)

  15. Anonymous says:

    Well, this year I didn’t miss the anniversary of my first blog post.

    I still can’t quite believe it’s…

Comments are closed.

Skip to main content