Santalic tailfans, part two

As I have said before many times, there is only one sensible way to make a performant application. (As an aside: perfectly good word, performant, deal with it!) That is:

  • Set meaningful, measurable, customer-focused goals.
  • Write the code to be as clear and correct as possible.
  • Carefully measure your performance against your goals.
  • Did you meet your goal? Great! Don't waste any time on performance analysis. Spend your valuable time on features, documentation, bug fixing, robustness, security, whatever.
  • If you did not meet your goal, use tools to discover what the worst-performing fixable thing is, and fix it.

Iterate on that process, updating your goals if it proves necessary, until you either have something that meets your goals or you give up.

My explicit goal for my little dictionary search program was that it give results fast enough that I would not be sitting there impatiently waiting for more than a few seconds. That's a very customer-focused goal, me being the only customer of this program. With only a very small amount of tweaking my program met that goal right away, so why would I spend any more time on it? The program takes just under two seconds to report the results for a typical query, which is faster than I can read.

But suppose that we did want to improve the performance of this program for some reason. How? Well, let's go down the list. We have goals, we have very clear code, we can measure the performance easily enough. Suppose we didn't meet the goal. The last thing on the list is "use tools to discover what the slowest fixable thing is".

A commenter conjectured that the performance bottleneck of this program was in the disk I/O. As you can see, every time we do a query we re-read the two megabyte dictionary file dozens of times. This has the benefit of using very little memory; we never have more than one line of the dictionary in memory at a time, instead of the whole four megs it would take to store the dictionary (remember, the dictionary is in ASCII on disk but two-byte Unicode if in strings in memory, so the in-memory size will double.)

That's a conjecture -- a reasonable conjecture -- but nevertheless, it's just a guess. If I've learned one thing about performance analysis it's that my guesses about where the bottleneck is are often wrong. I'll tell you right now that yes, the disk-hitting performance is bad, but it is not the worst thing in this program, not by far. Where's the real performance bottleneck? Any guesses? Could you know without using tools?

Here's the result of a timing profile run of a seven-letter queries with one blank, repeated 20 times:

43%: Contains
21%: ReadLine
14%: Sort
7%: ToUpperInvariant
15%: everything else

Holy goodness! The call to the Contains extension method in the query to test whether the dictionary word is in the rack set array is almost half the cost of this program!

Which makes sense, once you stop to think about it. The "Contains" method is by its highly general nature necessarily very naive. When given an array, 99.9% of the time it has to look through the entire 26-item array because 99.9% of the time, the word is not actually going to match any of the possible racks. It cannot take advantage of any "early outs" like you could do if you were doing a linear search on a sorted list. And each time through the array it has to do a full-on string comparison; there's no fancy-pants checks in there that take advantage of string immutability or hash codes or any such thing.

We have a data structure that is designed to rapidly tell you whether a member is contained in a set. And even better, it already does the "distinct" logic. When I replace

    var racks = (from rack in ReplaceQuestionMarks(originalRack)
                 select Canonicalize(rack)).Distinct().ToArray();


    var racks = new HashSet<string>(
                from rack in ReplaceQuestionMarks(originalRack)
                select Canonicalize(rack));

suddenly performance improves massively. The "Contains" drops down to 3% of the total cost, and of course, the total cost is now half of what it was before.

Another subtle point here: notice how when I changed the type of the variable "racks" from "array of string" to "set of string", I didn't have to redundantly change the type thanks to implicit typing of local variables. I want to emphasize the semantics here, not the storage mechanism. If I felt that communicating the storage mechanism was an important part of this code -- because it has such a strong effect on performance -- perhaps I would choose to emphasize the storage by eschewing the "var".

With this change, the program performance improves to about one second per query and the profile now looks like this:

39%: ReadLine
23%: Sort
11%: ToUpperInvariant
7%: the iterator block goo in FileLines
5%: ToArray (called by Join)
15%: everything else

Now the bottleneck is clearly the combination of repeated file line reading (48%) and the string canonicalization of every dictionary line over and over again (37%).

With this information, we now have data with which to make sensible investments of time and effort. We could cache portions of the dictionary in memory to avoid the repeated disk cost. Is the potential increase in speed worth the potentially massive increase in memory usage? We could be smart about it and, say, only cache the seven- and eight-letter words in memory.

We could also attack the canonicalization performance problem. Should we perhaps precompute an index for the dictionary where every string is already in canonical form? This in effect trades increased disk space, increased program complexity and increased redundancy for decreased time. Should we use a different canonicalization algorithm entirely?

All of these decisions are driven by the fact that I have already exceeded my performance goal, so the answer is "no". Good enough is, by definition, good enough. If I were using this algorithm to actually build a game AI, it would not be good enough anymore and I'd go with some more clever solution. But I'm not.

Comments (36)

  1. drdamour says:

    There’s a great lesson in there about assuming where the bottleneck is.

    I’m interested in knowing how you profiled or gathered the metrics you were using.  Some tool, or just a timer routine?

    Sufficiently buff editions of Visual Studio have an “Analyze” menu item. — Eric

  2. Matthew Kane says:

    This point about achieving customer-focused performance goals and stopping is definitely appropriate for application development. But when developing frameworks, I think you need to tweak the hell out of it. Nobody wants to be developing an application and find that they are not achieving their customer-focused performance goals because of a bottleneck in the framework. Often they don’t have the option of fixing the framework. Part of the point of a framework is to do the hard stuff that’s harder to maintain because the framework is more stable and doesn’t require as much maintenance. The application that needs to be maintained more because of changing requirements doesn’t need to deal with that stuff.

    Absolutely. And the way we make sure that the framework meets our customers’ performance needs is we talk to our customers and set meaningful performance goals based on the feedback they give us.

    The hard part is that there is almost never a win without a tradeoff. Just look at the trivial example we have here. Making this go faster means making some tradeoff, usually in the “more memory, less time” direction. But not all of our customers are time constrained. Many of them are memory constrained, particularly those customers who develop programs for hand-held devices. Knowing whether we’re maybe making things worse is a necessary precondition of making things better. — Eric

  3. Brad says:

    I indeed agree that setting goals and measuring against them is the correct approach, always, for real systems.

    For blog posts, sometimes it’s fun to dream about how to do things blazingly fast :).

    Building an index on the canonicalized form over the dictionary (say a B+tree or something) seems like it could be a win if you’re willing to double the disk usage to reduce the execution time substantially (since you wouldn’t even have to read in most of the dictionary then).

    I also tossed around the idea of using a precomputed Bloom filter to exclude most of your combinations out front without looking even into your index. This would probably allow you to have more than 2 blanks (which isn’t terrifically useful for Scrabble, but is for some other applications), since 90% of the time you could throw out a word without leaving the L2 cache.

    But indeed, for this application, for this purpose, there’s no reason to spend any more time on performance. That’s the difference between engineering and doing programming as a hobby.

  4. Frank Bakker says:

    While I agree there is more than one great lesson in here there is one thing I believe is missing:

    For some reason I don’t think this piece of code was intended to be actually used while playing scrabble, (wouldn’t that be cheating?) I think it was made because Eric enjoyed writing it and as a bonus it could serve as a subject for some interesting reading material.

    While the rational ‘Good enough is good enough’ applies to the code we build for our customers, I find there is a very good reason to optimize the hell out of an algorithm that we built just for our self: It is a lot of FUN!

    At the point where Eric said "But suppose that we did want to improve the performance", we got into the world where the customer no longer rules and we are allowed to tweak it up to the last CPU cycle! Just because we can!

  5. I find the rationale “good enough is good enough” to be inappropriate for many problems I deal with.  I’m a student with interest in pattern classification problems; a faster algorithm means using a larger (better) data set, or at least means using more fold in cross validation.  At the very least, it can mean more test runs, and thus a better understanding by myself of the behavior of the system.  Performance has yet to be fast enough, and I doubt it ever will be in my lifetime.

    A fundamental problem with the approach of writing code and then fixing it is that it’s not always easy to fix code that has the wrong performance characteristics.  For instance; HashSet<string> is rather slow due to the hashcode computation.  replacing string with a custom type that precomputes the hash can improve performance manyfold; but if the application is complex, this new type must be used throughout the Api to avoid needing recomputing.

    Recently I was looking at basic vector operations – which beg the question which implementation of a vector results in the fastest performance?  If you think you can just use an interface (say, IList) and then later replace the internals as appropriate, you’d be mistaken: a simple test shows that dot-products computed on an array cast to IList are  50 times slower than when computed directly on an array.  Using a C++ implementation improved performance a further 60%.  In 64-bit mode, results were even more awkward, where simply using a static, presumably trivially inlinable method to compute the dot-product (a setup with low to no overhead in 32-bit .NET and C++) resulted in a slowdown of more than a factor 2.

    My conclusion would be that, in particular in .NET, semantic detais such as IList vs. List vs. array make large  differences.  If your application is likely to be disk bound or in some other way “complex” then sure, you can afford to ignore such differences – but that’s mostly because CPU time is often not an issue at all.

    So sure, you could write an app and then fix the performance, but if you know which operations have reasonable overhead up front, you can avoid using other operations in tight loops altogether.

    I placed the results online:

    Two points.

    First, I agree that making good decisions up front during the early design and implementation is key, particularly for large systems. Something that I’ve ignored for the purposes of this discussion is that for a large system, the process I’ve outlined needs to start early. If an early prototype of a large system is orders of magnitude away from the performance goal, you might have a big problem. Of course, that necessarily requires having known goals!

    Second, though I agree that faster is better, obviously faster costs effort. Lots of effort.

    I’m sure you would agree that more robust is also better. And more fully featured is better, less buggy is better, more secure is better, better documented is better, all these things are better and they all cost effort. There is only a finite amount of effort to go around on any project. Effort is our most precious resource and it needs to be doled out carefully to make sure that we’re spending it where it can do the most good.

    — Eric

  6. CGomez says:

    Some of us use Visual Studio Express Editions.  While we have CLRProfiler to help spot out-of-control allocations, what do you suggest we do to spot performance issues?

    I am an expert on the C# programming language, not on what toolsets are available from what providers for what price. So I do not make suggestions. — Eric

    MSFT expects hobbyists and academia to use Visual Studio Express Editions as an entry point into building MSFT applications.  However, how do we train the next wave of engineers to measure applications without tools?

    I am an expert on the C# programming language, not on budgeting of educational institutions. So, I don’t know. — Eric

    In the workplace, there is tremendous pressure to use open source platforms because of perceived cost savings and ROI.  While MSFT has great studies showing that you get tremendous ROI with expensive editions of Visual Studio, these studies fall on deaf ears of CFO’s.  What can engineers who want to keep using tools like Visual Studio for coding do when they can’t produce performant applications but also can not measure to improve them?

    What do you suggest? This has plagued a department under fire to switch to “free” tools for some time.

    I am an expert on the C# programming language, not on convincing CFOs that getting the best tools money can buy is a sensible investment. So I don’t have any suggestions here either.

    Sorry that I’m completely unhelpful — these are excellent points all, but are also all questions about things I know nothing about. You should ask these questions of people like Soma; it’s his job to think about this stuff. — Eric

  7. Eric, while I like your new way of inserting your answer to comments, could you make it another color/background ? I think that would be easier to separate both visually and there would be no need to use [[ ERIC: ]].

    Good idea. How’s this? — Eric

    Also, nice illustration on how to improve perf without losing expressiveness of code. I might borrow that example when giving out formations, if you don’t mind 😉

  8. Don B says:

    A couple of comments.  (Zero: I like your blog.)

    1) I like your suggestion to immediately incorporate tools, if you have them.  People who haven’t investigated performance issues before tend to want to guess, and it’s a waste of time.

    2) I’d like to encourage beginners who are new to profilers to remember that everything has to add to 100%.  In the example, the first performance issue is taken off the table, then there are more that apparently take its place.  A common theme is for someone to clear out the 40% item and then say "Oh my gosh!  Now XYX() is taking 50%!  How did that happen?"   But SOMETHING has to take the 100%, even it if it’s so blazing fast that the program is done before the sound from the mouse click reaches your ears.

    You address this concern implicitly by saying "the program is fast enough now, so we don’t need to tweak it any more." (advice with which I totally agree).  

    Anyway, just a reminder to beginners that just because something takes the most time doesn’t mean it’s slow.  Thanks.

  9. CGomez says:

    Eric, I just wanted to say thank you for your response.  I appreciate your being upfront that you are not the right person to direct these questions to.  I think too many times commenters would walk away angry without realizing that just because someone makes a post about point A does not mean they are the right person to ask about tangential point B.

    Thanks again.

    You’re welcome. Thanks for your comments; you raise excellent questions. — Eric

  10. Smartass says:

    Eric, you can write a program that uses high-level concepts and then use a profile to optimize it from 2 seconds to 1 second, or you can simply write a good program to begin with, and it will take 1 microsecond to compute the solution.

    Where “good” means what? Apparently in your case good means “incorrect, unnecessarily hard to read, and unnecessarily fast”. I have a different definition of what makes code “good” — “correct” is first on my list. — Eric

    struct Dict {
      StrMap<cstring> words;
      cstring in,temp;
      Dict() {
        cstring contents(FileToString(“c:\words.txt”));
        cstring line;
        for(StringIter iter(contents); !iter.EofQ(); ) {
          strptr orig_line=iter.GetLine();
      void Lookup(in_strptr word, cstring &answers) {
        int idx;
        if ((idx=in.Find(‘?’))!=-1) {
          int idx2=in.FindFrom(‘?’,idx+1);
          if (idx2!=-1) {
            frep_(i,26*26) {
              if (shstring *result = words[temp]) {
                answers<<” “<<*result;
          } else {
            frep_(i,26) {
              if (shstring *result = words[temp]) {
                answers<<” “<<*result;
        } else {
          if (shstring *result = words[in]) {
            answers<<” “<<*result;

    Results: 0.2secs to initialize, and 1.5microsecs for the ANALYST example.

    I know you’ll say that this is not fair, and that your post makes bigger points about the strategy of writing readable code, how to allocate programmer’s time, etc. Granted.

    I am confused. Why would I say it’s not fair? What’s unfair about it? Aside from the fact that your implementation of the algorithm gives incorrect results — it is almost always easier to go faster when you don’t worry about correctness!  — Eric

    But I think you’re missing the biggest point of all — that picking a good algorithm before sitting down to write the program will obviate the need to optimize later, because the resulting program will simply be as fast as it theoretically should be.

    I am even more confused. Your program uses almost the same _algorithm_ mine does. You just made the performance choice to cache the entire dictionary in memory rather than reading it off of disk every time. (You have not correctly cached the dictionary, of course, but you’ve cached enough of the dictionary to get the right answer for the small set of test cases you’ve tried. Hint: what happens when you do a search on the rack “OPST”? What results should you get? What results do you get?)

    You’ve traded massive memory use for a great improvement in search time, which I explicitly called out was a deliberate tradeoff that I chose to not make. What would you say to someone who turned this around on you and said that they needed to optimize for minimal memory use? Your program has terrible memory performance — it uses up megabytes of memory when it only needs to use a few KB. Why do you automatically assume that speed is the most important thing?

    Also, your program is nowhere even close to “as fast as it theoretically should be”. We could make it considerably faster. You haven’t thought about this problem very deeply if you think this is as fast as you can get it. How would you change this program if you needed to keep the index in the processor cache, for example? How would you change it if the word list were too large to fit into memory but you still needed the speed? The O(1) dictionary lookup depends on a hash; is the hash function the bottleneck? If so, how would you improve it? If any of these were concerns then clearly you have not written a “good” program. Your implementation makes implicit assumptions about both the size of the problem and about what constitutes a “good enough” performance result. — Eric

    It’s really a shame to use 1 or 2 seconds to do 26 dictionary lookups (or 26*26), when Google takes a tenth of that time to search *the entire Internet*. Let’s be real. Performance counts.

    Of course it does; that’s my point. Performance counts to consumers, but it comes at a cost. It is important to understand deeply what the tradeoffs are between constrained resources like time and memory, and to know what your customer finds acceptable, so that you’ll know when you’re done optimizing. In your case, you haven’t even managed to write a correct program, so optimizing it is rather premature! Customers typically do not want “wrong but fast”. — Eric

  11. Smartass says:

    Eric — touche, a normalized word can obviously map to more than one dictionary word, so the code should read:

    words.Add(line)<<” “<<orig_line;

    instead of


    Your other objections seem to be:

    1. it uses 2MB of memory

    2. it is inefficient compared to some other solution

    I’ll address these one by one. First, your program takes up 5MB of ram the moment it’s loaded, before the search even runs. I don’t know why — maybe the command line prompts aren’t optimized in C# yet 😉

    How are you measuring that? I suspect that you might be confusing RAM with address space. They are very different. Of course my program consumes lots of address space — it loads the CLR and the framework. Who cares? Address space is cheap. The relevant measure is the scarce resource. On my desktop machine, RAM is not the scarce resource. What if it were? Or what if the dictionary size were large enough that your lookup table didn’t even fit into the address space available?  — Eric

    Second, 2 megs of RAM is cheaper than 1 second of time, that should be clear to us all. Up to a large enough limit (somewhere in the gigabytes these days), memory and time are interchangeable.

    1 second of time is ~2 billion operations.

    2 megs of RAM is ~2 million operations.

    So we have a factor of 1,000 improvement in frugality, and a factor of 1,000,000 in performance.

    I am not following your train of thought in the slightest here. 2 megs of RAM is cheaper than one second of time only when you’ve got lots of ram and little time. There are plenty of scenarios where you’ve got lots of time but are memory and storage constrained. Think cell phones, for example. When writing for cell phones the relevant metric is not time at all, but rather storage and battery consumption. — Eric

    How much faster could we get? The correct measure of program efficiency is the number of bytes output divided by time. Well, in your original example, there are 64 characters of output. If the lookup is 1.5us, the efficiency is 23 nanoseconds per character.

    No, the correct measure of program efficiency is units of useful work divided by value of resources consumed. Why do you believe that “bytes output” and “time consumed” are always the relevant metrics? The absolute utility of both the work and the resources is not determined by you or me, it is determined by our customers, who are, after all, the ones spending their money on the solution. No performance process can be successful without understanding the utility as seen from the perspective of the customer. — Eric

    Potentially, then, there is another factor of 50 or so that could be squeezed out. But by any metric, the solution is acceptable. Why do I say that? Because in the web space, where we are currently located, the usefulness is measured by how many people you could serve if you hosted your algorithm online. With a properly “performant” solution, a million times more as it turns out.

    This metric is completely artificial and has nothing whatsoever to do with my scenario. I’m writing this program for myself, I’m the only customer, and I’m not hosting a service online. By considering only metrics relevant to a narrow set of scenarios, and by artificially restricting the bounds on the problem to one that can be solved using your caching strategy then of course your solution is superior. It is inferior when things are different. 

    I could say that usefulness has nothing whatsoever to do with “how many people I can serve online”. Suppose I were to say that usefulness is defined as “how large a dictionary I can support”. Does your solution scale to a dictionary that could hold the entire human genome? If I were solving a gene sequence lookup problem instead of a scrabble rack lookup problem then your solution would not be suitable. 

     Why is “millions of times more simultaneous users” the sensible thing to optimize for and not “millions of times larger dictionary”, or “millions of times less available storage”, or any other possible direction we could push this problem? You’ve just picked one of them and enshrined it as the only sensible direction to push performance. It’s not. My point is that you’ve got to look at what the customer actually needs before you make choices on how to optimize the implementation. — Eric

    I would love it if you published a faster solution in C#. In fact, I dare you to write an anagram lookup in under 5 nanoseconds per output character using your original AALNST? example

    You dare me? Wow. No one has dared me to do anything in about twenty-five years. That didn’t work on me when I was ten years old and it certainly doesn’t work now! — Eric

  12. BakuRetsu says:

    In Argument, the spectator learns.. thanks..

  13. Smartass says:

    Eric, your ideas of memory vs. CPU power and availability need to be updated.

    A phone like Blackberry Bold has 624 mhz processor / 128MB of ram.

    An old Sony Ericsson P910 sports 160Mhz / 64MB ram.

    On these phones, your program would take 10x-50x the time it takes on your desktop, or 10-50 seconds even after your optimization.

    Why? My program is disk-bound, not processor-bound. Since cell phones do not have spinning metal disks, the timings are going to be very different. But, as we’ll see, that’s irrelevant. — Eric

    It’s just too slow, any way you sell it.

    First off, it’s not “too slow”. As I keep saying, as the only customer of this program on the desktop, I am the only one who gets to decide what “too slow” is, and it isn’t too slow.

    But let’s consider the cell phone scenario for a moment. In that scenario speed is not particularly important to the customer. Storage consumption and battery consumption are important to the customer. To solve this problem on a cell phone I would not use your cached-dictionary approach, for two reasons. First, it is far too memory intensive, and second, setting up the dictionary is far too processor-intensive. Rather, I would solve both problems by precomputing the dictionary into a b-tree of canonical forms. Or, rather, actually probably fourteen separate b-trees, one for each word length. That data structure on disk would make it very cheap in both storage and processor time to rapidly find matches without doing any on-phone processing of the dictionary.

    (Then there’s the question of how to efficiently store the “posting list” of matches. There are a number of clever things you can do to compress posting lists, but that would take me rather far afield.)

    — Eric

  14. Joren says:

    It’s miraculous that after having it so thorougly explained to you, you still do not understand Eric’s points, Smartass.

    Time is not an absolute priority. There may be uses where other factors are more important. In any case it is vital to first establish your priorities for your specific use, and only then optimise.

  15. Dave R. says:

    If you’re suggesting that ‘performant’ means something other than a performer in a play or similar, then you’re wrong. There’s no such word. There may be references on Wikipedia or elsewhere, but they’re created by programmers who don’t seem to realise that ‘efficient’ is a perfectly good synonym. No more excuses.

    Fascinating — you have a way of knowing when there is “such a word” or not. I would love to learn how to categorize utterances into words vs non-words. Can you give me a precise definition for what makes an utterance a word vs a non-word?

    As for your claim that “efficient” and “performant” are synonyms, clearly that is incorrect. “Efficient” is defined (by me) as “consuming a small number of relevant resources per unit of useful work“. “Performant” is defined (by me) as “meeting or exceeding the measurable performance expectations of a typical user“.

    Obviously there is a relationship between being efficient and performant, but a program can be efficient without being performant, and performant without being efficient, so clearly they cannot be synonyms.

    I notice however that you have made an interesting error in logic. You are claiming that “performant” and “efficient” are synonyms, and at the same time that “performant” is not a word. But surely to be “synonyms” two things have to both be words! How can two things be synonyms if one of them isn’t even a word?  — Eric


  16. Denis says:

    To me, all these Code Wars above show just how fiercely protective we, programmers, are over our own ideas (and over our own code that realizes those ideas).

    But the same goes for our customers: they usually have their own ideas as regards to what’s important in the applications they use, and what’s not. So, while some people out there may demand instant gratification and get frustrated by a slightest delay, others may put up with a long wait, so long as a funny animation, or a cool music, is played to them in the meantime.

    Of course, there are situations where objective criteria outweight the user satisfaction: if an application is designed to set off an emergency shutdown of a nuclear reactor gone bananas, and does it on time, it performs just fine, no matter what anyone thinks (they have to be alive to think it, right?) But if we don’t go into such extremes, at the end of the day, the performance criteria are actually subjective.

    I disagree — the extremes are subjective as well. We have made the subjective decision that some risks of exposure to hazardous conditions are acceptable, and some are not, and we budget accordingly.

    The hard part — particularly in your human-life-safety scenario — is how to turn subjective feelings about what is acceptable into crisp, objective requirements that we can program against. With nuclear plant shutdown software the customer has a subjective requirement — “safe enough” — and then has to turn that into an objective requirement — “software responds to outlier condition and initiates feedback-driven control in less than 20 milliseconds of sensing condition”. Is that “safe enough”? That depends on what the costs are of building the system compared to the benefit of mitigating the risk.

    And on that subject, if you ever get the chance to tour the insides of a reactor safety system, I recommend it. They’ve got some fascinating stuff in there. (I have a friend who is a safety officer at a power plant.) I was particularly impressed by the gravity-driven reactor poisoning system. Simple and effective. — Eric

    And so are our ideas about what is “performant”, “performative”, “performaniac”, or “performatilicious” 🙂

    And for those who like thigs formal, look into ITIL: utility: fit for purpose; quality: fit for use.

  17. Max says:

    Maybe I overlooked it, but where did you get that scrabble word list from? I would really like to try it and write a similar program, too.


  18. Peter N Roth says:


    Although using your phrase “where did you get that scrabble word list” in Google is probably not the best place to start, I found some interesting information using the phrase “2006 scrabble tournament word list”.

    I use the word “best” in this comment thread with some trepidation; note that I will not be defending its use.

  19. RussellH says:

    Great post.  Articles like this blog entry should be repeated all over the place along with those others that you linked to.  All good points.  I see all kinds of weird code added to applications “for performance” without being measured or even understood.

    “Did you meet your goal? Great! Don’t waste any time on performance analysis.”

    I would say one exception to this is if you are using SQL to access a database and you are concatenating user strings into SQL rather than using parameters, you have to fix that, even it performance is adequate.  If the concatenation prevents query plan reuse, you can be killing the performance of other applications (or future applications) accessing the same database server.  Not to mention that this is a huge security hole — I think you’ve blogged about SQL injection before.   But even if you are building the SQL string based on something not coming from the user, and it changes from one statement to the next, it could still prevent plan reuse (Oracle is really sensitive to this), and this can kill the performance of another application sharing the same database.

    My point is that sometimes code that performs well enough still needs to be refactored for the performance impact it has on other applications.  Make sure not to test the application in isolation.

    An excellent point. Another reason why performance goals and performance tests should be based on scenarios relevant to real-world customers. — Eric

  20. m.c. says:


    "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." (Donald Knuth)

  21. Dave R. says:

    Look, if you’re going to use a word that isn’t in the dictionary, then it would be useful if you actually defined it beforehand. How do I know your definitions of ‘performant’ or ‘efficient’ when you don’t actually say? We’re not psychic!

    But… the whole article is about what my definition of “performant” is. That’s the point of this article. — Eric  

    Anyway, I don’t really care if it’s not synonymous with ‘efficient’. Why should I go chasing around for definitions of non-words when there are perfectly good phrases that will do, such as the one you suggested – ‘exeeds expected performance’ seems fine to me. For someone who seems to crave precision, for you to use a word that isn’t in a trusted dictionary – which I would think is the very definition of a ‘real’ word – smacks of hypocrisy.

    A few things come to mind. 

    First, now you’re making a distinction between “non-words” and “real words”. I think there is a difference between these two things. “zqsr;7bv” is a non-word. Your claim, as I understand it, is not that “performant” is not a word. Rather, that it is a word, just not a “real word”.

    Second, sure, why use a word when a phrase will do? The French, for example, have no “real” word for “sandwich”. They are forced to use the equivalent phrase to “some meat and cheese between two slices of bread“. True fact! The French Academy has the right idea; why pollute the language with ridiculous neologisms named after English gambling addicts when there are perfectly good short phrases that you can use instead?

    And third, finally we come to your definition of what makes a word “real”. “Real” means “found in a standard dictionary” according to you. What are the implications of this definition of “real”?

    Well, for one thing it implies that English had no “real” words in it at all before 1755, when the first dictionary that could reasonably be called “standard” was published. English only had “unreal” words in 1754! How fascinating!

    Another interesting consequence of this definition is that all recent neologisms somehow made the transition from “unreal” to “real”. I remember a day when “yuppie” was not in any standard dictionaries, and yet I very clearly knew what it meant. Was the widespread use of this “unreal” word in the media a bad thing? Was its promotion to “real” word a bad thing?

    I was curious about your unusual definition of “real”, so I looked up “real” in some standard dictionaries. None of them listed “found in a standard dictionary” as a standard definition of “real”.

    So I’m afraid I must turn this around on you and tell you that your use of “real”  is not by your definition a “real usage”. And isn’t that ironic? It’s like ten thousand spoons when what you really need is a fork!

    (One wonders whether your “unreal” usage of “real” also “smacks of hypocrisy”, but I really care very little about hypocrisy.)

    — Eric

    I don’t mind the other common Microsoft turns of phrase, like beginning every spoken sentence with “so”, using “super” too often and using “ask” as a noun, but “performant” is an exception because it’s undefined and acts as a barrier to understanding. The very fact that you provided your own personal definition of it in your comment speaks volumes.

    Hey, when I use a word it means exactly what I intend it to mean. Me and Humpty Dumpty.

    And please tell me, precisely what volumes does the fact that in your comment you provided your own personal definition of “real” speak? 

    Does your non-standard usage of “real” act as a barrier to understanding?

    And how do you suppose we non-psychics are supposed to know what your definition of “real” is?

    — Eric


    “Not a real word”:


    Perhaps, given that the word “real” doesn’t really mean what you think it does, the argument we should be having is about whether “performant” is a cromulent word. I believe it is. Perfectly cromulent word, performant. It embiggens the language. — Eric

  22. Brett says:

    I’ve had the "Real Word" discussion many times in the past, and adopted a simple philosophy: words are simple a means of communicating ideas or emotions from one person to another. If I say it and you understand it, it’s a word. Eric, I understood exactly what you meant by ‘performant’, therefore it’s a word. And there ain’t nothin’ you can say to change my mind!

    If circumstances require precise understanding and minimum ambiguity, then the parties involved can agree on definitions, either directly of via reference; happens all the time in legal documents.

  23. Bill Richards says:

    A very nice article, thank you Mr Lippert, and as always it appeared on-line just when it was pertinent to my daily work. It seems that after all you must be a psychic!

    A little off topic I know, but I sense that someone (begins with Dave and ends with R.) is really talking from somewhere that most people use for other, more base, functionality!

    Words tend to make it into a dictionary once they can be cited from multiple sources, whilst having a common intended meaning.

    A good article on how words make it into the Oxford English Dictionary can be found here

    But this in itself does not make a word real or unreal -or maybe it does because we still do not know what criteria must be fulfilled to make a word either or.

    I, and several people I have interlocutionary acts with, can be heard using the word “munter” from time to time, and as far as I know it is not in any dictionary, but when people who do not know this word hear it being used  they do not say “you cannot use that word, it’s not in the dictionary”, they simply ask “what does that mean?”

    Your article quite clearly defines your usage of the word “performant”, and therfore I find your usage of that word to be performant in those terms, as I’m sure do many.

    Language is not set in stone, be it a natural language or an artificial one -do we argue that you can’t use the keyword “dynamic” in .net 4.0 just because it is not defined in .net 3.0?

    I’m not sure what volumes it speaks, but it certainly does speak volumes (I believe according to Dave’s volumes at least)

    What did the Red Queen say? “A word means exactly what I mean it to mean, at exactly the time I mean it to mean it” or something along those lines, but certainly on this topic we most definately are “running to stand still”.


    “When I use a word,” Humpty Dumpty said in a rather a scornful tone, “it means just what I choose it to mean – neither more nor less.”
    “The question is,” said Alice, “whether you can make words mean different things.”
    “The question is,” said Humpty Dumpty, “which is to be master – that’s all.”


    And as Columbo said so many times before, “there’s just one more thing …” NOBODY likes a smartass, I just love it when people are so far up their places that Dave R. speaks from that they can’t use their real name in a post … if someone thinks he’s great, he is for the most part quite tiny!

    I make mistakes and am proud of it how else would I learn? -notice that I’m proud that I make mistakes, I’m not proud OF them. Admitting you are wrong is a very difficult thing to do when you never are, and learning is very difficult to do when you already know everything.

    Why do people who know everything already bother reading blogs posted by mere mortals?

  24. John K says:

    Great article, even better ripostes.

    It seems to me though Eric – if I may call you Eric? – is that the ability of customers to give clear performance requirements would seem to make our jobs easier.  So just to summerize, you saying that the time it takes to move the code from acceptable to er… acceptable is wasted time? Surely not. 😉

  25. Doug says:

    I agree with the majority of your post (especially test first, assume never), and I appreciate that you are bringing this underappreciated topic into the light. However, in every application I have worked on performance *is* functionality. Granted, I write ‘scientific computing’ type code in video processing & mathematical modelling, but in every case a faster algorithm gives me more freedom to implement automated optimisation or generate more accurate results through faster polling. In my experience, if someone claims that current computers / a given algorithm are ‘fast enough’, I conclude that they are not thinking hard enough about what their application *could* do for their customer.

    It cannot do anything for the customer if it never ships. Yes, everything could always be better, but you can’t let that stop you from shipping it when it is good enough. And how do you know when it is good enough? — Eric

    Do you want to be renowned for ‘bare minimum’, or ‘sheer genius’? Granted, this does not apply to nearly trivial cases of applications with clearly defined, limited scope and functionality. If you’re writing a notepad for your granny, why are you reading a blog about performance anyway?

    Aside: “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” is valid for a sufficiently constrained value of ‘small’ but is trite and misleading, say 97% of the time.

    Bottom line, if functionality != performance, you’re not being creative enough.

  26. Frog says:

    Doug: Your argument doesn’t generalize at all.  Pick another highly performance-centric area: video games.  If (and it’s not unheard of) you blow 90% of your development time on engine optimization and have to squeeze gameplay into the remaining 10%, there’s no way you hear ‘sheer genius’ or ‘creative’ describing you or your product, you just hear derision!  Optimization should be about everything, not just speed.

  27. David Walker says:

    I picked a different approach.  Since you have to iterate the entire dictionary anyway, I wrote a matching function that compares the pattern to each word of appropriate size, rather than pre-computing all possible variations.  The ReadAllLines version below runs in just over 100 ms on my machine and the version using your original method of reading lines runs in around 400 ms.  (The difference being that the disk access is not happening during the timing, not that one method is faster than the other)

    I don’t guarantee that it works perfectly but in the checks I have made, it appears to be working properly.

           private static IEnumerable<string> SearchDictionary(string Letters)


               char[] S1 = CanonicalizeToArray(Letters.Replace("?", ""));

               return from line in FileContents

                      where line.Length == Letters.Length || line.Length == (Letters.Length + 1)

                      where CompareString(S1, CanonicalizeToArray(line))

                      select line;


           private static string[] FileContents = System.IO.File.ReadAllLines(@"C:projectsscrabble_word_findertwl06.txt");

           private static char[] CanonicalizeToArray(string s)


               char[] chars = s.ToUpperInvariant().ToCharArray();


               return chars;


           private static bool CompareString(char[] S1, char[] S2)


               int Wilds = S2.Length – S1.Length;

               int Counter1 = 0;

               int Counter2 = 0;

               while (Counter1 < S1.Length && Counter2 < S2.Length)


                   if (S1[Counter1] == S2[Counter2])







                       if (S1[Counter1] < S2[Counter2])


                           return false;







                       if (Wilds < 0)


                           return false;




               if (Counter1 + Wilds == S2.Length)


                   return true;


               if (Counter1 < S1.Length)


                   return false;


               return true;


  28. Clinton Pierce says:

    A while ago I had a similar need.  I needed a routine to validate whether or not what was being typed was possibly a word (given a limited dictionary) or not, while the user typed.  

    So that as they type:

    A       [Valid]

    AP     [still possible]

    APE   [Valid word, and a root for others…]


    And came to the conclusion that the slow performance was looking up the (possible) word.  What I wound up doing was writing another program to take the dictionary and serialize it as a trie  ( and compiling it into the EXE as an embedded resource.  I’m sure it was horribly inefficient for disk space and would have won me no friends at a code reivew… but it worked.  Splendidly.  The customer’s needs (mine) were met and I got to write a string trie implementation in C#.

  29. Stuart Leeks says:

    I read an interesting post on Eric Lippert’s blog recently where he was using LINQ to Objects to find

  30. Smartass et al. are missing the point…

    All the counter-arguments that have been made rely on assumptions about what is "Good Enough".

    However, the customer (in this case, Eric) defines what is "Good Enough", and the programmer (us) works to that specification. In real-world software development (where deadlines and timesheets and chargeable time matter), it is wrong to spend even an hour "optimising" past that point.

    I’ve never experienced a performance problem that I couldn’t have thrown hardware at if I chose to. Sometimes we’ve done that, sometimes we’ve spent time on performance analysis – based on our judgement about which was cheaper.

    What Eric is trying to say is that there’s always a trade-off – speed for memory, speed for size, speed for development time, speed for readability, speed for maintainability, speed for stability – and that’s a) not a trade-off that you always want to make, and b) most likely not a decision you should be making without consulting your client.

    "Hi, Mr Project Manager! I’m gonna deliver two days late because I want this function to run in 5 nanoseconds instead of 0.2 seconds, even though the spec says that ‘a few seconds’ will do. That ok?"

    Most relevantly, we take a rapid-development approach. I’ve written some shamefully poorly-performing code (I’m looking at you, LINQ-to-SQL Deferred Loading…) in the interests of getting our client hands-on with a system in-development sooner. "Good Enough" for a first-contact demo, or for UAT is quite different to "Good Enough" for production – it can help your project run more smoothly if you can do your performance optimisation later in the project (i.e. after you know that you need it), and deliver other things earlier. More than once it’s turned out that my shameful code has been "Good Enough". At that point it’s my job to work on something else, instead of spending valuable time fixing something that isn’t broken.

    I don’t think I’d be exagerating if I said "most" of our code doesn’t even make it to production. Imagine what a waste of time it’d be if I "optimised" *all* my code for speed…

  31. Lee says:

    Great article. Given that so many of the comments show the authors clearly miss the point of the article, it’s no wonder that so many projects exceed budget and are delivered late, if ever. More often than not, the best optimization you can make at any given time is to make none at all and move on to the next task.

    It is likely that I’m just not very good at explaining myself clearly. Writing well is hard, as it turns out! — Eric

    BTW, given that the word that’s “not a word” fits the textbook definition (’s definition, anyway) of a word, it’s pretty hard to argue that it’s not a word. And given that nearly everyone intuitively knows what the word means on first sight, regardless of how well liked the word is, “performant” is apparently quite performant!

  32. Brett J. says:

    I’d just like to provide another example of when CPU doesn’t matter.  I’m a corp wage slave ASP.NET webforms developer, and I like to write fast C# code.  Unfortunately, I’ve found that no user has ever noticed just how fast my C# code that runs on the server side of a webform page when they’re accessing the site from Norway to Houston over a VPN connection.  It wouldn’t matter diddly if I was using Smartass’ code or not.  They do often notice when I optimize things like Page size, or viewState size, or SQL stored procedures.

  33. Seb C. says:

    Very interesting Eric.  Actualy learnt quite a few things.  Ironically though, you attack one of the important part of the storage mechanism while an other, easier (in my mind anyways) solution is staring at you.

    At any given point in time, the list of racks you pass to your search query is a list of strings, all of the same length.  I would think the following might do a good bit to reduce the I/O part and maybe get another 20% performance gain.

    First, I would sort the word file by length/alpha (the alpha is probably useless – more to do with having a "clean" file)

    Then I would have:

       return from line in FileLines(dictionary,originalRack.Length)

              where racks.Contains(Canonicalize(line))

              select line;


    private static IEnumerable<string> FileLines(string filename, int rackLength)


       using (var sr = File.OpenText(filename))


           while (true)


               string line = sr.ReadLine();

               if (line == null)

                   yield break;

               // Stop reading any further if the words are longer than the current rack – this will reduce file I/O

               if (line.Length >  rackLength)

                   yield break;

               // Only output to the iterator if the line has the appropriate length – this, by limiting the number of records yielded will furthr reduce the impact of Contains (though now negligible since you’re using a HashSet and number of iterations

               if (line.Length ==  rackLength)

                   yield return line;




    On a last note, I would say that the oversight was relatively easy to make.  As much as I do love the convenience of LINQ, I did cringe the first time I saw the feature.  Being mostly a database warehousing person, I find that the mindset to deal with large amounts of data is usually very different than the mindset for object-oriented programming that most developers are used to.  LINQ is both a wonderful tool and a dangerously two-sided sword: it is easy to forget, through convenience the #1 rule in data warehousing: reduce the size of the input you’re working on.

    In this case the "where line.Length == originalRack.Length" clause in your original version is perfectly correct and functionality equivalent to my proposal, and it is beautifully simple.  But, it unfortunately doesn’t "seamlessly travel" to the underlying data system to actually reduce the number of scaned records.

    By sorting the dictionary file by word length and moving the length check inside the function we can pretty easily squeeze some more performance (though you most certainly got most of it with the HashSet).

    I would be curious to know how much we can get?  I had a much smaller dictionary file on hand (340k words), but statistically only 1/3 or the words were of a length under 8 characters (7 and 7+1), so this should reduce your I/O by 2/3.

    But very good and useful article, thank you!

  34. I read an interesting post on Eric Lippert’s blog recently where he was using LINQ to Objects to find

  35. CHP says:

    Interestingly performant is a common german word, which means: with good performance.


    It’s not a bad thing to introduce new words from another languages, in Germany we do that all the time.

Skip to main content