What does style look like, part 7


Over the course of this series on style, I’ve touched on a lot of different aspects, today I want to discuss aspects C and C++ style specifically.

One of the things about computer languages in general is that there are often a huge number of options available to programmers to perform a particular task.

And whenever there’s a choice to be made while writing programs, style enters into the picture.  As does religion – whenever the language allows for ambiguity, people tend to get pretty religious about their personal preferences.

For a trivial example, consider the act of incrementing a variable.  C provides three different forms that can be used to increment a variable. 

There’s:

  • i++,
  • ++i,
  • i+=1, and
  • i=i+1.

These are all semantically identical, the code generated by the compiler should be the same, regardless of which you chose as a developer (this wasn’t always the case, btw – the reason that i++ exists as a language construct in the first place is that the original C compiler wasn’t smart enough to take advantage of the PDP-8’s built-in increment instruction, and i++ allowed a programmer to write code that used it).

The very first time I posted a code sample, I used my personal style, of i+=1 and got howls of agony from my readers.  They wanted to know why on EARTH I would use such a stupid construct when i++ would suffice.  Well, it’s a style issue :)

There are literally hundreds of these language specific style issues.  For instance, the syntax of an if statement (or a for statement) is:

if (conditional) statement

where statement could be either a single line statement or a compound statement.  This means that it’s totally legal to write:

if (i < 10)
    i = 0;

And it’s totally legal to write

if (i < 10)
{
    i = 0;
}

The statements are utterly identical from a semantic point of view.  Which of the two forms you choose is a style issue.  Now, in this case, there IS a fairly strong technical reason to choose the second form over the first – by putting the braces in always, you reduce the likelihood that a future maintainer of the code will screw up and add a second line to the statement.  It also spaces out the code (which is a good thing IMHO :) (there’s that personal style coming back in again)).

Other aspects of coding that ultimately devolve to style choices are:

if (i == 10)

vs

if (10 == i)

In this case, the second form is often used to prevent the assignment within an if statement problem – it’s very easy to write:

if (i = 10)

which is unlikely to be what the developer intended.  Again, this is a style issue – by putting the constant on the left of the expression, you cause the compiler to generate an error when you make this programming error.  Of course, the compiler has a warning, C4706, to catch exactly this situation, so…

Another common stylistic convention that’s often found is:

do {
    < some stuff >
} while (false);

This one exists to allow the programmer to avoid using the dreaded “goto” statement.  By putting “some stuff” inside the while loop, it enables the use of the break statement to exit the “loop”. Personally, I find this rather unpleasant, a loop should be a control construct, not syntactic sugar to avoid language constructs.

Speaking of goto

This is another language construct that people either love or hate.  In many ways, Edsger was totally right about goto – it is entirely possible to utterly misuse goto. On the other hand, goto can be a boon for improving code clarity.  

Consider the following code:

HRESULT MyFunction()
{
    HRESULT hr;

    hr = myCOMObject->Method1();
    if (hr == S_OK)
    {
        hr = myCOMObject->Method2();
        if (hr == S_OK)
        {
            hr = myCOMObject->Method3();
            if (hr == S_OK)
            {
                hr = myCOMObject->Method4();
            }
            else
            {
                hr = myCOMObject->Method5();
            }
        }
    }
    return hr;
}

In this really trivial example, it’s vaguely clear what’s going on, but it suffices.  One common change is to move the check for hr outside and repeatedly check it for each of the statements, something like:

    hr = myCOMObject->Method1();
    if (hr == S_OK)
    {
        hr = myCOMObject->Method2();
    }
    if (hr == S_OK)
 

What happens when you try that alternative implementation?

HRESULT MyFunction()
{
    HRESULT hr;

    hr = myCOMObject->Method1();
    if (hr == S_OK)
    {
        hr = myCOMObject->Method2();
    }
    if (hr == S_OK)
    {
        hr = myCOMObject->Method3();
        if (hr == S_OK)
        {
            hr = myCOMObject->Method4();
        }
        else
        {
            hr = myCOMObject->Method5();
        }
    }
    return hr;
}

Hmm.  That’s not as nice – some of it’s been cleaned up, but the Method4/Method5 check still requires that you indent an extra level.

Now consider what happens if you can use gotos:

HRESULT MyFunction()
{
    HRESULT hr;

    hr = myCOMObject->Method1();
    if (hr != S_OK)
    {
        goto Error;
    }
    hr = myCOMObject->Method2();
    if (hr != S_OK)
    {
        goto Error;
    }
    hr = myCOMObject->Method3();
    if (hr == S_OK)
    {
        hr = myCOMObject->Method4();
    }
    else
    {
        hr = myCOMObject->Method5();
    }
    if (hr != S_OK)
    {
        goto Error;
    }
Cleanup:
    return hr;
Error:
    goto Cleanup;
}

If you don’t like seeing all those gotos, you can use a macro to hide them:

#define IF_FAILED_JUMP(hr, tag) if ((hr) != S_OK) goto tag
HRESULT MyFunction()
{
    HRESULT hr;

    hr = myCOMObject->Method1();
    IF_FAILED_JUMP(hr, Error);

    hr = myCOMObject->Method2();
    IF_FAILED_JUMP(hr, Error);

    hr = myCOMObject->Method3();
    if (hr == S_OK)
    {
        hr = myCOMObject->Method4();
        IF_FAILED_JUMP(hr, Error);
    }
    else
    {
        hr = myCOMObject->Method5();
        IF_FAILED_JUMP(hr, Error);
    }

Cleanup:
    return hr;
Error:
    goto Cleanup;
}

Again, there are no right answers or wrong answers, just choices.

Tomorrow, wrapping it all up.

Comments (37)

  1. Siggma says:

    And now I’ve read the lot… A stange habit I noticed in reading PERL code, and even Delphi, would be to use ‘lazy logical operators’, something like this:

    if(Method1()==S_OK && Method2()==S_OK && (Method3() && !Method4()))Method5();

  2. Man, how could I have forgotten ++i;. Doh!

    Fixed.

  3. Siggma,

    Interestingly enough, I’ve seen the lazy && thing done before. And I’ve seen it screw up hideously.

    Every time I’ve seen someone try this, it was to try to be clever, and they’ve almost always gotten it wrong. At a minimum, using constructs like this insert stumbling blocks when reading code – you have to stop and puzzle out what the author was trying to do, and if they got it right.

  4. Scorponok says:

    The ‘lazy evaluation’ style might look big and clever, but whenever I’ve done it before I’ve had horrible problems when debugging – it’s just too difficult to find out which operation failed without stepping into each individual one. With Larry’s method, you can just step over each function call and check the return value. I tend to write kinda verbose code for the same reason – it’s so much easier to debug if you’re not calling 6 functions on the same line.

  5. Shmuel Baron says:

    IMHO, the goto statement is very useful when writing COM-style code that acquires/releases interface pointers, when you don’t have stuff like CComPtr available. Pyramid-style code (like Larry’s first MyFunction version) is just horrible and I really can’t stand it. :)

    Ofcourse, I’d rather just use C++ construction/destruction semantics (i.e. smart pointers) to ‘flatten’ the code for stuff like that, but sometimes it’s not possible.

    I’ve seen some code example on MSDN that did something similar but instead of using goto they used __try, __finally and __leave.

    Kinda weird if you ask me.

    I’ll try to find the URL.

  6. Drew says:

    Note that choice of prefix or postfix increment in C++ can have performance implications if your variable happens to be an object that overloads these two operators (iterators are a good example). While prefix increment can simply change the internal state of the object, postfix increment must first create a copy of the object, change internal state, then return the copy.

  7. __try/__finally/__leave are fascinating constructs. In particular, it’s unbelievably expensive to return from inside a __try block – it takes literally tens of thousands of instructions on non x86 platforms. So most developers use a "goto-the-finally-clause" construct (usually hidden by a macro).

  8. Of course there’s an easier way to do what the KB recommends:

    Simply call CheckTokenMembership(mytoken, CreateWellKnownSid(Administrators), &isadmin).

  9. Mike Dimmick says:

    That do/while trick caught me out for about two hours not so long ago. I really had to stare hard to work out what it was up to. Use a goto – it’s clearer.

    I’ve got into the habit of using ++i from some STL iterator code. When you try to implement operator++(int) (the equivalent of i++) you realise that in order to provide the correct semantics, you have to copy the object, increment the original, then return the copy. There’s a degree of inefficiency here. The compiler is allowed to elide (remove) temporaries, but not anything you actually declared, IIRC.

    Having said that, I rarely have more than one assignment operator in a single statement, and I very rarely use expressions containing ++. I normally use it in a line on its own, or in a for statement’s third controlling expression. I also rarely use the comma operator, especially in for statements. I tend to limit myself to simple for statements in C-based languages, where a single variable is initialised in the first part, tested in the second, and modified in the third (this does include linked-list walking, not just integer variables). Anything more complicated belongs in a while, do/while, or for(;;)/break structure, depending on whether the test is needed at the top, bottom, or somewhere in the middle.

  10. Mike, I agree with your comments pretty much 100% – I also rarely use comma, etc.

  11. Mike Dunn says:

    I personally like the do { } while(false) construct, just so I can avoid being yelled at for using goto.

    One thing I hate is macros that hide jumps. There is some sample code in the WMP Format SDK that has macros with whacky names (such as CPRg and COrg) which test function return values, set specially-named local variables, and then jump to specially-named labels. *shiver*

  12. John Weston says:

    >> As does religion – whenever the language allows for ambiguity, people tend to get pretty religious about their personal preferences.

    Not sure what you are implying here. But, I’ll use this opportunity to share the truth.

    "For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life." (Jhn 3:16)

  13. John,

    Religion in this case means an unswerving faith in the correctness of your belief.

    For instance, if I say that ++i is better than i++ (or do/while(false), or if () <statement>, without facts to back it up, it’s a religious argument.

  14. 11/17/2004 12:20 PM Larry Osterman

    > Man, how could I have forgotten ++i;. Doh!

    Because there are still people who code things like

    a[i++] = b;

    or

    if (++i < n) { ….. }

    Even if you don’t code such monstrosities yourself, you still need to remember that i++ and ++i are not equivalent.

    11/17/2004 4:12 PM John Weston

    > But, I’ll use this opportunity to share the

    > truth.

    Be sure not to buy oil from Muslim countries because it would be offensive to your own beliefs. Also don’t buy a Japanese or Chinese TV, and don’t buy an Intel processor if it was designed in a country you consider false.

  15. Chris Nahr says:

    By the way, the boolean constant in the endless while loop should be true, not false. Otherwise the loop would be very endful. :)

  16. GlebD says:

    From Herb Sutter’s Guru of the Week (http://www.gotw.ca/gotw/002.htm, solution #3): "Preincrement is more efficient than postincrement, because for postincrement the object must increment itself and then return a temporary containing its old value. Note that this is true even for builtins like int!" So, ++i is more efficient than i++, and this is not a style issue.

    Regarding COM, I often use code like this:

    HRESULT hr = SomeFunction();

    if (FAILED(hr)) _com_raise_error(hr); // throws _com_error exception

    With this I’m free to process the exception wherever I like, even in the same function, providing single exit point and making the code much nicer.

  17. <i>btw – the reason that i++ exists as a language construct in the first place is that the original C compiler wasn’t smart enough to take advantage of the PDP-8’s built-in increment instruction, and i++ allowed a programmer to write code that used it</i>

    This is actually incorrect. See: http://cm.bell-labs.com/cm/cs/who/dmr/chist.html

    In particular:

    "People often guess that they were created to use the auto-increment and auto-decrement address modes provided by the DEC PDP-11 on which C and Unix first became popular. This is historically impossible, since there was no PDP-11 when B was developed. The PDP-7, however, did have a few `auto-increment’ memory cells, with the property that an indirect memory reference through them incremented the cell. This feature probably suggested such operators to Thompson; the generalization to make them both prefix and postfix was his own. Indeed, the auto-increment cells were not used directly in implementation of the operators, and a stronger motivation for the innovation was probably his observation that the translation of ++x was smaller than that of x=x+1."

  18. Imperfect says:

    You gotta watch it with telling people that "i++" and "++i" are equivalent, because they’re not. It’s been brought up before that people can write horrific code like:

    a[i++] = b;

    Which is true, I do see that sometimes still, and it is a dangerous line, but! How about basically every "for" loop out there? The following two lines are not equivalent:

    for (i = 0; i < 10; i++);

    for (i = 0; i < 10; ++i);

    The first one will loop from 0 to 9. The second will loop from 1 to 9.

  19. Mike Dunn says:

    Chris> while(false) is correct, because it’s not an infinite loop, it’s a one-time loop. The condition in a do loop is tested at the end of the loop, so the result is the code in the do block is executed once.

    If __try/__finally were actually usable in C++, I imagine most people would use that instead of the funnier-looking do-while(false).

  20. Centaur says:

    @GlebD re: if (FAILED(hr)) _com_raise_error(hr);

    Borland Delphi has a couple of functions, Win32Check and OleCheck. Win32Check takes a BOOL argument and returns it unchanged, but only if it is nonzero. Otherwise, it raises a Win32Exception. OleCheck takes an HRESULT and also returns it unchanged, unless FAILED(hr), in which case it also raises an exception. This way, one can write

    hr := OleCheck(SomeFunction(someArguments));

    In C++, this can be made a macro, with an added benefit that the macro can use __FILE__ and __LINE__ to construct the error message of the exception. Of course, one has to think carefully whether a call is “critical” (and must succeed or throw) or “optional” (and may succeed or fail without a serious effect on the calling function).

  21. SuperBK says:

    I also like the do { } while(false) construct, to keep from having goto’s or multiple return statements:

    if (somethingfails)

    return;

    if (somethingelsefailed)

    return

    instead:

    do

    {

    if (somethingfails)

    break;

    if (somethingelsefails)

    break;

    } while (false)

    // cleanup

    return;

  22. Jonathan,

    Good citation. As a counter, do you remember the old PCC (portable C compiler)? When you’d port Unix to a new platform, you’d use PCC to do it – it had templates for each language construct, and you’d plug in an assembly language version of that construct.

    PCC didn’t have a peephole optimizer or any other feature that could turn:

    mov ax, [value]

    inc ax,

    move [value], ax

    into:

    inc [value]

    But the template for ++ did take advantage of the language construct.

  23. Skywing says:

    Personally I dislike Hungarian notation. Proper variable naming makes it unnecessary in most circumstances, IMO.

    The worst thing is when you find some Hungarian named variables that are just plain wrong, though:

    typedef struct tagMENUITEMINFO {

    UINT cbSize;

    UINT fMask;

    UINT fType;

    UINT fState;

    UINT wID;

    HMENU hSubMenu;

    HBITMAP hbmpChecked;

    HBITMAP hbmpUnchecked;

    ULONG_PTR dwItemData;

    LPTSTR dwTypeData; // oh yes, LPTSTR […] dw!

    UINT cch;

    HBITMAP hbmpItem;

    } MENUITEMINFO, *LPMENUITEMINFO;

  24. Imperfect, I think you are wrong. Both for loops loop from 0 to 9.

    The second is the same as

    i = 0;

    while (i < 10)

    {

    // loop body…

    next:

    ++i;

    }

    with any continue; replaced by goto next;

    Both for loops would be different only in C++, and even then only if you had an overloaded operator++ on the type of i.

  25. James says:

    ‘i += 1′ vs. ‘++i’ is not a style issue. The STL iterator models are an example of this: forward iterators (like those of std::list) only support the increment and decrement operators. Random access iterators support addition and subtraction, including their assignment forms.

    ‘i += 1′ and ‘i++’ are even less interchangable, since they evaluate to different things. Just put ‘int j = ‘ in front, then look at j. The postfix operator is less efficient than the prefix operator when the compiler can’t determine that the copy isn’t necessary and doesn’t have side effects, but could be faster or slower than += depending on whether an implicit conversion is required for the argument (imagine i is a massive_number constructable from long).

    These operations have different meanings, and support for them (and their relative efficiencies) varies with the types used. Do you still think this is a simple matter of style?

    Proficient programmers try to rely on as few concepts as possible, to not conflate more and less general abstractions, or assume a larger set of concepts than necessary. Comparison is another good example: iterator loops generally use != as it is supported in more concepts, but conversely, std::map only uses an ordering predicate and determines equivalence by !p(a, b) && !p(b, a) rather than using a separate equality predicate (aside: you can often spot a C programmer amongst C++ programmers by whether their for-loops use < or !=).

    Imagine how much harder our lives would be if we reasoned about sorting algorithms considering <, <=, ==, !=, =>, >, assignment and copying, rather than just compare and swap? Your own code should be no different. If you end up implementing a less than operator for a user defined type, when you could have used inequality, you’ve gone and created another function to have bugs in (of course, if the type intuitively has the operator, that’s another matter).

    Okay, so you’re a systems programmer and you work mainly with native integer types. Maybe I should give you a break about the high-level stuff 😉

    My last point is that, while I also find the idiom loathsome, you’ve missed the structured programming argument about goto if you think ‘do .. while (false)’ has the same issues. I suggest you read Knuth’s "Structured Programming using Go To" paper.

    – James

  26. James,

    <snarky>STL? What’s that? It’s not part of the core language, is it?</snarky>

    My comment’s snarky but it’s relevent. Just because the STL authors chose to ignore the semantics of the language isn’t the fault of the language.

    There IS a semantic difference between i++ and ++i in the language, but there isn’t a semantic difference between i++ and i+=1 (assuming that they appear in a statement with equivilant operator precedence).

    Remember my comment about style in the final post in the series: Style’s about improving the maintainability of code. Any use of constructs that do NOT improve the readability of code should be avoided at all costs.

    And I personally consider operator overloads of all sorts to be a classic example of a situation where complexity is unnecessarily added (it’s without a doubt my least favorite language feature, closely followed by templates). We’ve been through that particular discussion before (see the all developers must know assembly language post for more info, it’s in the archives).

    Just consider that one senior developer spent literally a year doing nothing but dealing with the consequences of a development team choosing to use operator overloads in a performance critical portion of the operating system, and you’ll understand where my issues come from. Operator overloads hide code and thus hide complexity, which is utterly unacceptable in high performance systems code.

  27. James says:

    Larry,

    While I don’t necessarily agree with all of your sentiments, I do agree with your rationale that ease of maintenance is critical.

    Please allow me to suggest the Bulka and Mayhew book "Efficient C++" to people who are concerned about operator overloading in C++ after reading your reply. Overloading is syntactic sugar for making a function call, but as you suggest, you should understand the implications of making such a call.

    As for templates, that’s a discussion for another day. I could argue strongly in their favour (compare std::sort to qsort), but no doubt you have arguments as to their complexity (for authors at least).

  28. Actually my issues with templates are directly related to my issues with operator overloads: They hide complexity. Templates add the additional problem of hiding code size.

    It’s not always clear to people that having:

    std::vector<std::string> stringArray;

    std::vector<int> intArray

    brings in two complete copies of std::vector (and two copies of everything that std::vector uses) into their application. That’s the biggest evil of templates. I’ve seen real-world examples of unbelievable code bloat brought on by templates.

    The good news is that current linkers help to ameliorate this problem (by collapsing identical functions), which means that CComPtr<IUnknown> and CComPtr<IDispatch> can share implementations (otherwise you’d have a separate completely identical implementations of CComPtr for every single COM object your program used).

  29. 11/18/2004 6:21 PM Larry Osterman

    > There IS a semantic difference between i++

    > and ++i in the language,

    Right.

    > but there isn’t a semantic difference

    > between i++ and i+=1 (assuming that they

    > appear in a statement with equivilant

    > operator precedence).

    Wrong. There isn’t a semantic difference between ++i and i+=1 (assuming your ssumption). There is a semantic difference between i++ and i+=1 except when you’re discarding the result. (Of course style and readability do call for discarding the result and only letting i benefit from the side effect, but the language definition doesn’t.)

  30. Sean Malloy says:

    IMO, I would break the single return point rule rather than use a goto.

    #define HANDLE_FAILURE(hr) if ((hr) != S_OK) return hr;

    HRESULT MyFunction()

    {

    HRESULT hr;

    hr = myCOMObject->Method1();

    HANDLE_FAILURE(hr);

    hr = myCOMObject->Method2();

    HANDLE_FAILURE(hr);

    hr = myCOMObject->Method3();

    HANDLE_FAILURE(hr);

    hr = myCOMObject->Method4();

    HANDLE_FAILURE(hr);

    hr = myCOMObject->Method5();

    HANDLE_FAILURE(hr);

    return hr;

    }

    ofcourse, if you have cleanup code there’s an issue that can be solved by a goto (as you suggested)

  31. Sean,

    That only works if Method1() doesn’t have side effects (like allocating memory).