// If this happens, I am going to quit and become a barista


While chasing down a bug, I ran across this comment:

// Arbitrary cap on message length.
// If you change the format string, then update this to match.
// (Although, if we ever need to place a million icons on the desktop,
// I am going to quit and become a barista.)

But the buffer was not big enough. Should I suggest to the developer that they check if Starbucks is hiring?

The buffer wasn't big enough because the format string was incorrect.

constexpr wchar_t formatString[] = L"(%f3.3,%f3.3)";
constexpr size_t worstCaseFormat = ARRAYSIZE(L"(000.000,000.000)");

I guess they never looked at their log file, because the format string is wrong. For an icon at position (10, 10), the resulting log message is (10.0000003.3,10.0000003.3), which is longer than the allotted worst-case string.

That's because the format string %f3.3 is interpreted as

  • %f: A formatted floating point number, with a default of six places after the decimal.
  • 3.3: The literal characters 3.3.

The format string was intended to be %3.3f, which means a formatted floating point number with a minimum of three characters of output and exactly three places after the decimal.

That would result in (10.000,10.000).

Note that if you ask for three places after the decimal, then you're going to get at least four characters of output anyway, so the first 3 in %3.3f is meaningless. It doesn't cause any harm, but it doesn't do anything either. If they wanted a fixed-width format, then they could have used %7.3f, which would space-pad the value on the left to ensure three characters before the decimal.

Comments (30)
  1. Brian_EE says:

    [quote]Should I suggest to the developer that they check if Starbucks is hiring? [/quote]
    No. The comment was clearly if you needed a million icons on the desktop. Not if he made a coding error.

    1. Stefan Kanthak says:

      He also gets an malus point for abusing ARRAYSIZE(L””) instead of sizeof(“”)

      1. But the output is a Unicode string, so we should measure the Unicode string. The string lengths of ANSI and Unicode strings need not be equal.

        1. Stefan Kanthak says:

          The constant string he used to let the compiler evakuate the length but holds only ASCII characters, ie. no character which occupies more than a single code point.
          JFTR: if the routine to interpret the format string would emit more than single code point for some characters both variants were susceptible to buffer overflows.

          1. It holds only ASCII characters today, but maybe not tomorrow.

    2. D says:

      Thank you for your sane comment. Too often people are lost in language over function.

  2. haha says:

    If he becomes a barista, he can make my coffee order wrong like he did with the code. ;)

  3. Not even %7.3f is fixed width. Large inputs like 1e10 will give you outputs like “10000000000.000”.

    1. If the input is 1e10, then it’s time to become a barista.

      1. Kevin says:

        Or just round to “inf”. IEE-754 binary16 (half-precision) does it on numbers far smaller than 1e10, if you’re unfortunate enough to be using 16-bit floating point numbers. However, while the C standard’s definition of floating point is notoriously lax, I think it does define a minimum range larger than this, so your compiler isn’t allowed to just silently shrink all of your floats out from under you. That’s probably a Good Thing.

  4. Baltasar says:

    The first error is clear enough: the field width info should be in-between the ‘%’ and the ‘f’. I guess it was some kind of typo. The second one is very widespread: many people think that the number left of the dot is the number of positions of the integral part of the number, while it is the total width, inluding the dot itself.

    1. Marcos Kirchner says:

      Funny how that happens a lot in the database world as well. DECIMAL(5, 2) means five digit precision in total, of which two are after the decimal. People tend to confuse with 5+2 digits.

  5. Joshua says:

    I really hope the programmer was female, because otherwise we can stack a grammar error on top of the format string error.

    1. AP says:

      What’s the grammar error? There’s no such thing as a “baristo,” if that’s what you mean.

    2. Why? Surely you don’t think that a barista has to be female?

    3. DWalker07 says:

      Joshua, I don’t understand this comment.

  6. AsmGuru62 says:

    So, the formatted string was never seen in a Watch panel in debugger.
    Interesting.

  7. colin stevens says:

    I hope we don’t need a floating point number with 3 digits after the decimal to represent the number of icons on the desktop

  8. Ivan K says:

    I’m not sure if McCafé was a thing when that comment was written, but I’d like to imagine the ex-programmer coming face to face with a “Billions and Billions Served” sign on the first day of a new job. (Even though the format arg error isn’t related to the code comment, as Brian EE said in the first blog comment.)

  9. cheong00 says:

    [I am going to quit and become a barista]
    The version for Taiwan people is “I am going to quit and sell chicken fillet”. :P

    And yes, hawkers selling deep fried / roasted chicken fillet can earn more than average people with programming job.

  10. cheong00 says:

    Btw, shouldn’t it be something that caught by unit testing? If there exists an edge case where a large array of random generated coordinates are spilled into the function, it should throw an exception / runtime error.

  11. Yuri Khan says:

    Surely you don’t need three decimals for the coordinates of a desktop icon? On the other hand, probably want at least four digits for the integral part.

    1. The units are in icons, not pixels. 1000 icons across and 1000 icons down is a million icons. But if you turn off “Align to grid”, you can put icons at fractional positions.

  12. John Styles says:

    One would have hoped that whichever of K & R came up with the formatting in C (someone will now tell me it comes from BCPL or the programming language defined in the Dead Sea Scrolls [Dead C Scrolls?]) could have found someone in Bell Labs who did some sort of application programming to tell them how format statements need to work, maybe someone who wrote Fortran or COBOL?
    People always cite the creator of make not changing the tabs into spaces because he had a dozen users as causing the most wasted time of that decision of all time, but the dysfunctional C formatting semantics seems worse to me – I have never worked on a scientific / technical system where we haven’t had to write our own replacement or more usually wrapper code round it to catch the cases where the length isn’t what you would expect.
    (Obviously there are philosophical errors in programming languages one can argue about but this just seems a mistake that could have been contained locally in the relevant routine(s))

    1. Peter Doubleday says:

      Pretty darned sure it wasn’t BCPL (although I suppose I could download the compiler from Martin Richards’ site and experiment with it). Pretty darned sure because the fundamental types in C (char, short, int, long, float, double) don’t match the fundamental types in BCPL, and C formatting is closely tied to the fundamental types. I could be wrong on either count, of course.

      What is fascinating about this is that C++ tried really, really hard to be as backward-compatible to C as humanly possible, except in one notable area – input/output. Whatever you might say about C++ streams, they are at least in theory a lot easier to use if you want predictable precision and so on.

      And what’s even more interesting (to me as a historian of CS) is that almost everybody in commercial C++ programming went straight ahead and ignored the C++ IO system in favour of the obviously broken C model.

      1. Matteo Italia says:

        > What is fascinating about this is that C++ tried really, really hard to be as backward-compatible to C as humanly possible, except in one notable area – input/output. Whatever you might say about C++ streams, they are at least in theory a lot easier to use if you want predictable precision and so on.

        Uh, not really; each and every time I tried to use them, there were bizarre corner cases (the precision flag behavior is particularly hideous), modifiers that stuck when I didn’t expect them to and the other way around. Also, its syntax is insanely verbose and completely detached from how the output will look on screen. It’s no surprise that virtually any other language used placeholder-based formatting (but type-safe, unlike the C predecessor).

        > And what’s even more interesting (to me as a historian of CS) is that almost everybody in commercial C++ programming went straight ahead and ignored the C++ IO system in favour of the obviously broken C model.

        iostream is a lethal mix of badly designed stuff, it’s not surprising that virtually everybody stayed away from it. They solved a few problems (type safety in particular), made formatting somewhat safer but IMHO way more inconvenient (see paragraph above), and overengineered everything else (which came with its own performance problems), not meeting any practical objective.

        – streams interface and extension is black magic; how many C++ programmers do you know that know off the top of their heads what is a stream “sentry” object and why they may ever need it? or how the locale stuff (that filters all data going through the streams) actually work (bonus points if he/she actually knows what a “facet” is, and that a locale is ultimately a type-erased container of unrelated types)?
        – confusing interface = widespread errors; the formatted/unformatted input methods mixed all in the same cauldron made its fair share of victims (extra leftover whitespace, anyone? again, how many people do you know that know about std::ws?); std::endl? while(!is.eof())?
        – all this complexity and magic entities (like the IO manipulators) don’t really buy in any flexibility – if I wanted to e.g. add a manipulator to output numbers in base 36 I wouldn’t even know where to start (std::hex & co. just set fixed flags into the stream object, that handle some arbitrary formatting – essentially, what was already supported by printf’s format strings);
        – decoupling the underlying data source/sink from formatting was an important advancement, but the standard library itself hides this surprisingly well; fstream/ifstream/ofstream, their string counterparts… WTF? all you need is an istream/ostream/iostream + filebuf/stringbuf – if only the base stream classes had some sensible policy to handle their underlying streambuffers lifetime…;
        – stream buffers handle buffering, data sink/source interface and encoding conversion, all in the same class, while they should be separate concerns to be handled by separate entities; the resulting interface for implementers is a complete mess (plus some extra assholeries thrown in for free, such as the “extra character” thing in the underflow/overflow calls); I have yet to manage to write a streambuf that handles seeks correctly;
        – code page conversion, which, in a decently made library could be just another filter, orthogonal to most of the rest of the library, is yet another pervasive fuckup (and don’t get me started with the whole wide chars mess).

        I could go on; the complete mess that is iostream made some victims in an important codebase of ours as a coworker of mine decided that streambufs where “the right thing”; he spent a week trying to implement a more civilized interface on the top of streambuf (when we wasted hours navigating in a mess of confusingly-named protected virtual functions called by slightly-differently-named public nonvirtual functions), implemented some filtering streambufs (zlib, http chunk, bzip2, …), then we spent the next two months debugging the problems that arose due to bizarre corner cases allowed by the streambuf spec.

        This to say: it’s not surprising that people don’t use this drek; what is surprising is that anybody deemed it decent enough to include it into the C++ standard.

  13. Dave Bacher says:

    You can never fix this though, because some obscure program depends on the stack hammer to function correctly.

Comments are closed.

Skip to main content