What does style look like, part 6

Previously in the series, I've touched on indentation and commenting, The next aspect of "style" that I want to talk about is variable (and function) naming conventions.

Upon reflection, I hungarianized the code way too early in the series, it properly belongs in this article, so lets go back to the beginning, and see what happens to the code with different naming styles.  Here's the original code:

#include "list.h"main(int argc, char *argv[])  {    int i, N = atoi(argv[1]), M = atoi(argv[2]);    Node t, x; initNodes(N);    for (i = 2, x = newNode(1); i <= N; i++)      { t = newNode(i); insertNext(x, t); x = t; }    while (x != Next(x))      {        for (i = 1; i < M ; i++) x = Next(x);        freeNode(deleteNext(x));      }    printf("%d\n", Item(x));  }

 

Rewriting the code in hungarian yields:

#include "list.h"main(C cArg, SZ rgszArg[])  {    I iNode, cNodes = atoi(rgszArg[1]), cNodesToSkip = atoi(rgszArg[2]);    PNODE pnodeT, pnodeCur; InitNodes(cNodes);    for (iNode = 2, pnodeCur = PnodeNew(1); iNode <= cNodes ; iNode++)      { pnodeT = PnodeNew(i); InsertNext(pnodeCur, pnodeT); pnodeCur = pnodeT; }    while (pnodeCur != PnodeNext(x))      {        for (iNode = 1; iNode < cNodesToSkip ; iNode++) pnodeCur = PnodeNext(pnodeCur);        FreeNode(PnodeDeleteNext(pnodeCur));      }    printf("%d\n", Item(nodeCur));  }

So what changed?  First off, all the built-in types are gone.  Hungarian can use them, but not for most reasons.  Next, the hungarian types "I", "C", and "SZ" are used to replace indexes, counts and strings.  Obviously the C runtime library functions remain the same.  Next, I applied the appropriate prefix - i for indices, c for counts, p<type> for "pointer to <type>".  The Node type was renamed PNODE, in Hungarian, all types are uppercased.  In Hungarian, the name of the routine describes the return value - so a routine that returns a "pointer to foo" is named "Pfoo<something relevent to which pfoo is being returned>".

But, of course, there are LOTs of alternative naming conventions other than Hungarian.  For example, Microsoft's published extensive documentation on design guidelines for class library developers who want to publish managed libraries.  Lets recast the sample application in the .Net coding style:

So what does the .Net/CLR coding style look like?  First off, the CLR naming guidelines for methods require that they use PascalCase, and that verbs or verb phrases are used to name the methods.  Second, the CLR naming guidelines require that parameters to functions obey camelCase.  This example doesn't have any parameters declared other than argv and argc, so that aspect dosn't change.  But the naming guidelines are pretty clear about the choice of names: parameters should always be spelled out, abbreviations should only be used if developers generally understand them.  And you should NOT use hungarian notation.  They go on to say that "good names describe semantics, not type".  I find this last one fascinating, since it highlights one of the major differences between systems Hungarian and applications Hungarian - Systems Hungarian represents the type, apps Hungarian represents the semantics.

Because the CLR naming conventions don't describe anything about local variables, the CLR standards for local variables have to be inferred.  This makes sense because the CLR coding standards are all about describing the public interfaces to components, and not their internal standards.  As such, they explicitly don't cover local variables (or member variables (fields), except to say that you're not allowed to have any).  But a CLR-ish style can be inferred.  The first thing that's clear is that abbreviations are out, and that variables should represent full English words.  I'm going to go out on a limb and keep the "i" as an indexer because it's a convention that's pretty standardized.  This still leaves the camelCase vs PascalCase issue open, just for grins, lets go with camelCase. 

#include "list.h"main(int argCount, char *args[])  {    int i,  numberOfNodes = atoi(args[1]), nodesToSkip = atoi(args[2]);    Node newNode, currentNode; InitNodes(numberOfNodes);    for (i = 2, currentNode = NewNode(1); i <= numberOfNodes; i++)      { newNode = NewNode(i); InsertNext(currentNode, newNode); currentNode = newNode; }    while (newNode != Next(currentNode))      {        for (i = 1; i < nodesToSkip ; i++) currentNode = Next(currentNode);        FreeNode(DeleteNextNode(currentNode));      }    printf("%d\n", Item(currentNode));  }

One thing that was pointed out by one of my co-workers is that it's often extremely useful to visually differentiate between parameters, local variables, and member variables, and in this example you can't differentiate between parameters and locals.  This isn't as big a deal in a small example like this, but for a larger, more complicated example, it might turn into a big deal.

If we decide that parameters are declared in camelCase, if you apply this paradigm, you can name local variables with PascalCase.  That leaves member variables.  Even though the CLR guidelines for members explicitly prevents prefix characters (like m_ and g_), if you just use "_" as a prefix character, it allows you the ability to differentiate between them.

Lets look at the example using camelCase for parameters and PascalCase for local variables.

#include "list.h"main(int argCount, char *args[])  {    int I,  NumberOfNodes = atoi(args[1]), NodesToSkip = atoi(args[2]);    Node NewNode, CurrentNode; InitNodes(NumberOfNodes);    for (I = 2, CurrentNode = NewNode(1); I <= NumberOfNodes; I++)      { NewNode = NewNode(i); InsertNext(CurrentNode, NewNode); CurrentNode = NewNode; }    while (NewNode != Next(CurrentNode))      {        for (I = 1; I < NodesToSkip ; I++) CurrentNode = Next(CurrentNode);        FreeNode(DeleteNextNode(CurrentNode));      }    printf("%d\n", Item(CurrentNode));  }

Hmm. Two major issues jump out immediately with this example.  The first one is:

NewNode = NewNode(i);

There's a major inconsistency here - the local variable and the function have exactly the same name (of course, in a non case-sensitive language, this issue is more significant).  With luck, the ambiguity can be resolved by the compiler, but that's only for this case - and a style has to work for ALL cases (pun unintended).  It's pretty horrible when you get to the point when you realize that youve got to either (a) change your code for the sake of style of (b) change your style for the sake of the compiler.

This illustrates an important aspect of a naming convention: Whatever you do, you shouldn't have a naming convention that allows for ambiguity.  In this case, each naming (and capitalization) convention can be considered as a puzzle piece.  When you come up with your final coding style, you need to take all of these into account.  You have four variables: Global Function/Member Function naming convention/case, Parameter naming convention/case, Local Variable naming convention/case, and Member Variable naming convention/case.  And your have a myriad of possible naming convention/case options, from hungarian to CLR, from PascalCase, to camelCase, to whatever else you can come up with.

When defining a style, you need to balance all of these aspects to come up with a coherent whole.  Fundamentally, it doesn't matter which naming/case convention you chose for each of the variables, just as long as it's aesthetically pleasing to you and your team.

So lets take the sample application and reframe it in the first of the CLR conventions I mentioned above (camelCase local variables):

/*++ *        <Copyright Notice> * *  Module Name: *        MyProgram.c * *  Abstract: *        This program implements a solution to the Josephus problem.   *        Code taken from Robert Sedgewick's "Algorithms in C, Third Edition", *        Program 3.13, on Page 103. * *  Author: *        Robert Sedgewick * *  Revision History: * *    LarryO             11/12/2004     Restructured for weblog posts. *    Robert Sedgewick   <Unknown>      Created.   --*/#include "list.h"/*++    main - Implements the Josephus problem and prints the result. * * *     main (cArg, rgszArg) * *     ENTRY   cArg - number of arguments. *             rgszArg - array of cArg null terminated strings *     EXIT    None. * *     <discussion of internals, one or more paragraphs> * * --*/main(int argCount, char *args[]){    int i    int numberOfNodes = atoi(args[1]);    int nodesToSkip = atoi(args[2]);    Node newNode;    Node currentNode;    InitNodes(numberOfNodes);    //    // Create a list of cNodes nodes.      //     for (i = 2, currentNode = NewNode(1); i <= numberOfNodes; i++)    {        newNode = NewNode(i);        InsertNext(currentNode, newNode);         currentNode = newNode;     }    //    // Walk the list of nodes, freeing the node that occurs at every cNodesToSkip nodes in the list.    //    while (newNode != Next(currentNode))    {        for (i = 1; i < nodesToSkip ; i++)        {            currentNode = Next(currentNode);        }        FreeNode(DeleteNextNode(currentNode));    }    //    // Print out the value of the current node.    //    printf("%d\n", Item(currentNode));}

When I first rewrote the code in hungarian, I got screams from the commenters about how hideous it looked in hungarian. Just for grins, lets compare the code portions of the two examples (hungarian and CLR).

First the hungarian:

main(C cArg, SZ rgszArg[]) {    I iNode;    I cNodes = atoi(rgszArg[1]);    I cNodesToSkip = atoi(rgszArg[2]);    PNODE pnodeT;    PNODE pnodeCur;    InitNodes(cNodes);    //    // Create a list of cNodes nodes.      //     for (iNode = 2, pnodeCur = PnodeNew(1); iNode <= cNodes ; iNode++)     {        pnodeT = PnodeNew(iNode);        InsertNext(pnodeCur, pnodeT);         pnodeCur = pnodeT;     }    //    // Walk the list of nodes, freeing the node that occurs at every cNodesToSkip nodes in the list.    //    while (pnodeCur != PnodeNext(x))     {        for (iNode = 1; iNode < cNodesToSkip ; iNode++)         {            pnodeCur = PnodeNext(pnodeCur);        }        FreeNode(PnodeDeleteNext(pnodeCur));    }    //    // Print out the value of the current node.    //    printf("%d\n", Item(nodeCur));}

Now, the CLR:

main(int argCount, char *args[]){    int i;    int numberOfNodes = atoi(args[1]);    int nodesToSkip = atoi(args[2]);    Node newNode;    Node currentNode;    InitNodes(numberOfNodes);    //    // Create a list of numberOfNodes nodes.      //     for (i = 2, currentNode = NewNode(1); i <= numberOfNodes; i++)    {        newNode = NewNode(i);        InsertNext(currentNode, newNode);         currentNode = newNode;     }    //    // Walk the list of nodes, freeing the node that occurs at every cNodesToSkip nodes in the list.    //    while (currentNode!= Next(currentNode))    {        for (i = 1; i < nodesToSkip ; i++)        {            currentNode = Next(currentNode);        }        FreeNode(DeleteNextNode(currentNode));    }    //    // Print out the value of the current node.    //    printf("%d\n", Item(currentNode));} 

What I find fascinating about putting these two examples side-by-side is that stylistically, there's actually not that much difference between the two - if you hold all things constant and just vary the naming convention, the code isn't that different.

Tomorrow, lets look at some C/C++ minutae.

 

EDIT: Fixed some typos (Thanks David Mellis).