Hungarian Notation


We’ve been having an internal discussion recently about coding guidelines and the rules that should be in place to create the “best” code possible.  “Best” is, of course, up to interpretation.  Readability, maintainability, perf, etc. all play into this.  One of the elements that has come up is what sort of naming convention we should be using.  Considering that we’re all programmer geeks we want to come up with simple and clear rules that everyone can follow.  Of course, when it comes to simple rules for naming one of the first things that springs to mind is Hungarian Notation (HN).  There are wildly mixed feelings about HN here and i wanted to get some information from you if you use it or not and how you feel about it.

For those who don’t know HN was created by Charles Simonyi @ MS. you can read more about HN at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnvs600/html/HungaNotat.asp

Condensed, the goals of HN are as follows.

  1. The names will be mnemonic in a very specific sense: If someone remembers the type of a quantity or how it is constructed from other types, the name will be readily apparent.
  2. The names will be suggestive as well: We will be able to map any name into the type of the quantity, hence obtaining information about the shape and the use of the quantity.
  3. The names will be consistent because they will have been produced by the same rules.
  4. The decision on the name will be mechanical, thus speedy.
  5. Expressions in the program can be subjected to consistency checks that are very similar to the “dimension” checks in physics.

 the specific rules are as follows:

  1. Quantities are named by their type possibly followed by a qualifier. A convenient (and legal) punctuation is recommended to separate the type and qualifier part of a name. (In C, we use a capital initial for the qualifier as in rowFirst: row is the type; First is the qualifier.)
  2. Qualifiers distinguish quantities that are of the same type and that exist within the same naming context. Note that contexts may include the whole system, a block, a procedure, or a data structure (for fields), depending on the programming environment. If one of the “standard qualifiers” is applicable, it should be used. Otherwise, the programmer can choose the qualifier. The choice should be simple to make, because the qualifier needs to be unique only within the type and within the scope—a set that is expected to be small in most cases. In rare instances more than one qualifier may appear in a name. Standard qualifiers and their associated semantics are listed below. An example is worthwhile: rowLast is a type row value; that is, the last element in an interval. The definition of Last states that the interval is “closed”; that is, a loop through the interval should include rowLast as its last value.
  3. Simple types are named by short tags that are chosen by the programmer. The recommendation that the tags be small is startling to many programmers. The essential reason for short tags is to make the implementation of rule 4 realistic. Other reasons are listed below.
  4. Names of constructed types should be constructed from the names of the constituent types. A number of standard schemes for constructing pointer, array, and different types exist. Other constructions may be defined as required. For example, the prefix p is used to construct pointers. prowLast is then the name of a particular pointer to a row type value that defines the end of a closed interval. The standard type constructions are also listed below.

It all seems well and good, but i end up finding the code written in this way completely unreadable.  One of the reasons for this might be the following suggestion:  “Conclusion:  Do not use qualifiers when not needed, even if they seem valuable.”

Wow…  so you end up with code that looks like:

1   #include “sy.h”
2   extern int *rgwDic;
3   extern int bsyMac;
4   struct SY *PsySz(char sz[])
6   {
7      char *pch;
8      int cch;
9      struct SY *psy, *PsyCreate();
10     int *pbsy;
11     int cwSz;
12     unsigned wHash=0;
13     pch=sz;
14     while (*pch!=0)
15         wHash=(wHash<>11+*pch++;
16     cch=pch-sz;
17     pbsy=&rgbsyHash[(wHash&077777)%cwHash];
18     for (; *pbsy!=0; pbsy = &psy->bsyNext)
19     {
20        char *szSy;
21        szSy= (psy=(struct SY*)&rgwDic[*pbsy])->sz;
22        pch=sz;
23        while (*pch==*szSy++)
24        {
25           if (*pch++==0)
26              return (psy);
27        }
28     }
29     cwSz=0;
30     if (cch>=2)
31        cwSz=(cch-2/sizeof(int)+1;
32     *pbsy=(int *)(psy=PsyCreate(cwSY+cwSz))-rgwDic;
33     Zero((int *)psy,cwSY);
34     bltbyte(sz, psy->sz, cch+1);
35     return(psy);
36 }

I dunno, but i can’t read that code at all.  Let’s say i did know hungarian, woudl that help?  Im’ not so sure.  Starting at the top:

rgwDic.  It’s an array of words called “dic”.  Not sure what “dic” is but maybe it’s a dictionary.  Ok, so a dictionary maps keys to values somehow.  But what are the keys, what are the values?  Is it a dictionary that uses hashes?  I have no idea.  I really don’t have a single clue what rgwDic is right now.   Amazingly, Simonyi recommends that that name actually be grpsy.  grpsy… i would be completely lost with that.  Ok onto the next field.

bsyMac.  No clue.  We’re doing something with a SY type… so it’s like the last SY out there…   Of course, i have no idea what an SY is… so i’m still clueless.

char* pch.  Ok.  It’s a precompiled header.  Just kidding :)   We have some string.  I would prefer std::string, but that’s just me.

int cch.  some count of characters.  Is it related to pch?  I have no idea

Ok, some local function def follows.

And at this point i’m completely lost.   I’m not even going to go on to the rest of the code.  The lack of clear names has me compeltely confounded.  I can’t tell how things are related and i’m scared out of my mind about touching even the slightest character in this code.

Have you had experience using hungarian in a project?  Did it turn out to be a good thing, a bad thing, or soemthing you didn’t even notice?  Personal experiences would be very appreciated.

Note: we’ve been discussing coding conventions in the context of writing C# code (if that helps).


Comments (47)

  1. Cyrus, drop me a line at work, I’ll get you a copy of the Hungarian manual that Doug Klunder wrote – he explains what Dic and Mac (and Max) are.

    There are a couple of typos in your example (->bsy which should be ->pbsyk for example).

    You should talk to some people over in Office or in Exchange – they’ve used strict hungarian extensively with great success.

  2. Larry, that code snippet was taken from the MSDN article i was referring to. I’ve noticed some other problems with it (such as simple syntactic problems). I do know what Dic and Mac are… however, they help me not at all when trying to grok that code.

  3. David Betz says:

    Thank goodness for the CLS!!!

  4. I felt the same way as you did before I joined Exchange. After about 3 months of living and breathing hungarian, it gets a lot easier.

    Btw, there’s a rendering issue with the article, you’re loosing the left chunk after the code.

  5. Jens Samson says:

    I have read countless articles about how reading code is more dificult than writing code. This is a perfect illustration. That it gets better after three months is an indication taht if your best developer leaves, you might be lucky is a new developer understands what was written within three months …

    I have to fight every day against these C-ish coding standards. My boss says I’m using too long variables names. Who cares ? My code is readable and with intellisense id on’t need to type it all …

  6. I’ve had a good experience using Hungarian Notation on projects in languages with loose typing (i.e. PHP). There, since you typically don’t have a graphical debugger nor type conversion enforcement, it’s important to keep in mind what the type of each variable is — and hungarian-style prefixes help.

    Of course the prefixes have to be followed with a useful variable name (i.e. $sQueryString for a query command string, or $iNumberOfWidgets)

  7. Stuart Dootson says:

    re: Larrys rendering problem – only in IE :-) In Firefox and K-Meleon (both Gecko based), it’s fine…

  8. Joe Duffy says:

    One thing to note, given that you can see the declaration very clearly for all these variables, Hungarian doesn’t buy you much. IMHO for broader scopes or denser code (though that function’s pretty damned dense!) it starts to show at least marginal value. Not that I’m defending Hungarian, mind you.

    For comparison purposes, you should post a C# example of the same algorithm with more meaningful variable names. It’s hard to determine cleanly whether it’s the algorithm which is complex, or the coding style itself.

  9. Ben says:

    I was under the impression that MS were moving away from HN for their .Net development:

    http://blogs.msdn.com/brada/articles/361363.aspx

  10. duncan says:

    OK, I admit it – I use a form of hungarian notation – I prepend a lowercase m to all the member variables in my class. I find an indication of scope usefull (actually I wish the IDE could show this instead using font colouring or something but thats another topic).

    But in general I find that hungarian notation is confusing and pointless – and worse it is frequently out of date when someone changes the type of a variable without changing it’s name…

    frankly I’d prefer to stay well away from all types of hungarian…

  11. Rosyna says:

    The problem with hungarian notation is that it often tries to be too descriptive about the type, and very, very non-descriptive about what the freakin var is used for. Consider the following example.

    CFDictionaryRef fontDict=NULL; // unless I’m mistaken the above code example dereferences what could be a NULL pointer…

    Anywho, that is clearly a Dictionary about a font.

    If we look at the hungarian notation version:

    CFDictionaryRef cpDic=NULL;

    (Is there some length limit to hungarian notation?)

    Anywho, a CFDictionaryRef is an opaque data type. Meaning you only know it as a const void pointer. You don’t know the actual contents of it. This allows the structures that describe it to arbitrarily change between OS revisions/platforms (CoreFoundation is Open Source/Cross Platform) without it causing any portability problems because someone references a member of a struct.

    The other problem is when you have a Windows API specific type that is a size of a word in Win16 but the size of a double word in Win32. The code still says wSlap even though it should be dwSlap and because hungarian notation is considered "infallible", the developer will assume it is the size of a word, make a buffer that size, then get all kind of whacked results when the buffer can’t hold the value.

  12. Stu Smith says:

    I think he had the germ of a good idea, but unfortunately he foolishly proposed it as a task that a human should perform, instead of it being a compiler task. I guess he didn’t think beyond the current tools of the day (ie C).

    As an example: People tend to write iVar or nVar "because it’s an integer" (duh). In ‘proper’ Hungarian that would be iVar for an index and nVar for a count. What he should have proposed would be something like this:

    abstract class IntegerType { … } // No such thing as an instance of an integer

    class IndexType : IntegerType { … }

    class CountType : IntegerType { … }

    class Array<T>

    {

    CountType Count { get { … } }

    T this[IndexType index] { get { … } }

    }

    Here’s my suggestion for a good coding guideline… don’t use for or foreach unless you’re going to use every value (ie don’t put an if in a for). Reasons: forces you to think about faster data-structures, prevents tons of nested code, and is easier to read (the filters are right up there with the loop, instead of buried somewhere inside).

  13. David says:

    I find that the downsides, including having to update it when a variable type changes (and sometimes people forget to), having to mentally parse the type prefix even when it’s irrelevant to figure out what’s going on in the code, and the fact that it fails to encourage keeping functions short enough to easily track the initial declaration of the variable, outweigh the advantages by a fair bit.

  14. Orion Adrian says:

    I think that hungarian notation is probably one of the more mis-understood concepts. It’s not about encoding data types as much as it is about coding semantics. But then again it is about encoding data types (i.e. how many bytes am I using). The problem I see is that it has two fundamentally different uses in different environments and they get confused.

    I see several problems with hungarian notation, some of them it’s own fault and many are the fault of the languages itself.

    Currently languages like C++ and C# straddle the line between really caring about the way data is stored and not caring about how the data is stored. I think it is this fundamental problem that makes hungarian a problem. That and using hungarian doesn’t prevent you from having to come up with good names for variables.

    nVar and iVar represent semantics. These are good examples of hungarian. They actually tell me something. They tell me semantic information (one is a count, one is and index). However dw and w for dword and word don’t tell me anything semantic about the type. These are only going to get me into trouble.

    The fundamental problem comes in how we encode information about an object in it’s name. Some things are safer than others. It really comes down to the realities about naming or labeling something and there’s quite a world about that as well.

    When choosing a name for something we need to pick something that will instantly tell us what it is but not tell us anything that might change about it. Of course this all comes down to probabilities for the most part.

    dwOptions might seem like a good name at the beginning, but since we can’t garuntee that it will always use a dword for storage it’s unsafe.

    flagOptions is probably safer since what we really want to say is that the information is transferred using flags. If this changes then we’ve fundamentally changed how we are transmitting that data and we have to change all the calling code anyways.

    </streamOfThought>

    Orion Adrian

  15. Jeff Parker says:

    Hmm is it just me or is your blog a little off there, Everything is shifted to one side.

    Anyway My own personal rule of thumb is I use Hungarian in any language that is not inside VS. Or some language that is not well represented in VS. So these would be VBScript, JavaScript, Batch files anything of that nature. The reason why is the hungarian notation really helps me keep track of what variable are what and how it is all declaired at a glance.

    Now for things like C# and VB.net and so on. Well with a mouse over hover you get tool tips in VS that show you what things are. And the code just looks better to me. However I can imagine any of my .net apps being opened outside of VS. If it were it would be harder to read, as camel, pascal cased.

  16. Greg says:

    Is Hungarian notation incompatible with using descriptive and even long variable names? I never thought so. You can do both.

    int* pnRecordsRemaining;

    BOOL bHasFileChanged;

    and so on.

    You can write code that reads closer to natural language, and get the benefits of the Hungarian prefix, both. In fact it helps that the prefix is a short code. That lets the rest of the variable name stand out when reading it in "natural language" mode vs. drilling down on details like types.

  17. Rosyna says:

    Greg, in which case would hasFileChanged not be a boolean?

    And I wonder, is it practice at MS to not initialize variables at creation time? Sample code nevers seems to. Always best to for when code evolves.

  18. In 2001, I "converted" a group of VB programmers used to a common derivative of hungarian to use the normal c# camel case style. That was all good and we have never looked back with only one exception. Similar to Jeff Parker’s comment, the control names in asp.net we went back to using hungarian notatation like syntax because we wanted to differenciate the control that held the text value from the variable that held the actual value. As we get better at building mvc style code, this is becoming less of an issue but we haven’t changed our guidelines just yet.

  19. William says:

    Hungarian notation is a C abomination. I’ve used it extensively, because I’m a chameleon coder. I want my code to fit in with the environment it’s coded in. This means retaining the style used by the code I’m maintaining, or the general style of the majority of the APIs that new code will be interfacing with. Since I’ve done 10+ years of Win32 and MFC coding, I’ve done an extensive amount of hungarian notation. Here’s what’s wrong with it.

    1) Even from a C stand point, hungarian is cryptic. The prefixes are all meaningless single character codes. In the simple cases, you can learn what they mean and survive. sText is fairly obvious for example. But, as they grow, things become less obvious. Is it pcsBuffer? scpBuffer? psBuffer? Can anyone really read any of that? And this is still not as complex as hungarian can get. *I* may know the "proper" way to do this and can read the code, but most folks can’t.

    2) From a C++, C# or other OO language, hungarian starts to look even worse. The notations we have only cover the C type system. The vast majority of the types dealt with in OO languages are user defined. What prefixes do you use then? I’ve seen people use a single prefix, o maybe, indicating a UDT type, but when the majority of the types in your code are UDTs, this isn’t helpful in any way, and the rest of hungarian starts to seem even less appealing. Worse, I’ve seen people try and create new prefixes for every UDT. This makes the code even less understandable, since no one but the author is going to know what any of the prefixes mean.

    3) In C, the prefixes become useful because the types are declared at the beginning of a function and when a variable is first used, someone reading the code may have to scroll/scan up several lines to determine the type. But in most OO languages it’s possible and preferred for numerous reasons to declare the variables as close as possible to the first use. You rarely have to do any scrolling/scanning to fully understand the type of a variable at the point in code in which it is used. The other case in C is for structures, which are likely declared in entirely different files from where they are used. But in OO languages you usually don’t deal with structures but classes and with classes you usually don’t deal with data elements directly, but instead call methods on the class which effect state. See the reason for hungarian is reduced if not eliminated by these language constructs.

    4) Even though (3) seem to indicate that hungarian might be a good thing in C, I don’t really agree with that either. It generally leads to less comprehension that simple naming conventions. I can’t tell you how many times I’ve seen names become less meaningful because a developer applied hungarian and thought that gave enough information. Your rgwDic is a good example of this. Even after assuming the reader knows enough about hungarian to determine what type this refers to, they are left entirely clueless as to what its purpose is. There too much syntactic information and not enough semantic information in the name to be useful.

    Prefixes are generally a useful thing. As some people have pointed out, they are extremely useful for scope information, not only to indicate the scope to the reader but also to prevent name clashes. But generally, prefixes need to be kept to a small number of short codes for very specific and frequent cases, and the rest of the time should be fully spelled out portions of the name, which actually makes them seem less like prefixes to begin with.

  20. RonO says:

    I used to be a die-hard HN (as type not intent, unfortunately) user. I used as few characters as possible to describe the type of the variable, but didn’t typically scrimp on the describing part of the variable.

    When I started using VS.NET, I decided to follow the recommendation to stop using HN. The only "exception" is for visual controls. One of the training courses I took for .NET had us post-fix the type on our names. Thus, a name stored in a text box would be NameTextBox. I use a shorter version if I can help it (e.g., NameText). I’m not really thrilled with it, but as Philip suggests, it seems good to differentiate our visual controls from other variables.

  21. Larry: "Btw, there’s a rendering issue with the article, you’re loosing the left chunk after the code. "

    Must be an IE problem. Looks fine in FireFox to me :)

    I redid the code a bit, hopefully it looks better now.

  22. Ilya:

    "I’ve had a good experience using Hungarian Notation on projects in languages with loose typing (i.e. PHP). There, since you typically don’t have a graphical debugger nor type conversion enforcement, it’s important to keep in mind what the type of each variable is — and hungarian-style prefixes help.

    Of course the prefixes have to be followed with a useful variable name (i.e. $sQueryString for a query command string, or $iNumberOfWidgets) "

    Agreed. In PHP it makes sense. But in C++/C# it seems unnecessary. Also, from reading Simonyi’s treatise it looks like you woldn’t follow the prefix with a useful variable name. Which i would really hate.

  23. Joe: "One thing to note, given that you can see the declaration very clearly for all these variables, Hungarian doesn’t buy you much. IMHO for broader scopes or denser code (though that function’s pretty damned dense!) it starts to show at least marginal value. Not that I’m defending Hungarian, mind you."

    This is why i’m also against dense code. Code should be light and simple to grok.

    "For comparison purposes, you should post a C# example of the same algorithm with more meaningful variable names. It’s hard to determine cleanly whether it’s the algorithm which is complex, or the coding style itself."

    I’ll try to do that. Unfortunately, i can barely tell wtf this function does because it’s so darn confusing 😉

  24. Ben: " I was under the impression that MS were moving away from HN for their .Net development: "

    Yes, that’s true. We’re discussing the merits of that decision.

  25. Duncan: "

    frankly I’d prefer to stay well away from all types of hungarian…"

    Hey!! My best is hungarian. And i don’t think she’d like your advice that i stay well away from her! :)

  26. Stu: I agree 100%. It seems like hungarian is there to solve an issue that the type system exists to solve. In our own code we have integers that represent far too many things. Counts, indexes, deltas. I would much prefer to have concrete types for all of those and then have the compiler make sure typesafety is preserved.

  27. Personally, I don’t see any value in including the type of a variable in its name. If you want to know the type of a variable, look up its decleration.

    I can understand why it would have been valuable when Windows was first created. If you are using a plain text editor to edit code that has hundreds of thousands or millions of lines of code, navigating to the defintion of a symbol in order to determine its type can be a bit of a chore.

    However, with modern development tools that can "instantly transport" a user from a symbols usage to its definition, identifying the type of a variable is trivial, and the extra info in the name serves no real purpose.

    My vote would be to stay away from hungarian notation. It adds no value, and it makes code look down right ugly.

  28. Dave Aspinwall says:

    I think Hungarian notation is great. In fact, it should be used for everything, not just code development. For instance, see how much clearer things are if we prefix all nouns with a ‘n’, verbs with a ‘v’, adverbs with a ‘d’, and adjectives with a ‘j’. Vdoesn’t nthis vlook jbetter?

    Nthat’s dnot dreally jenough, though. Jplural nnouns vshould dreally vhave an nindicator, "pl". plnsubjects vshould vhave a "sbj" and sbjplnobjects vshould vhave a "obj" nprefix. Nwe vcan vuse "tr" for jtransitive nverbs, and trvuse "int" for jintransitive nverbs.

    Jour plndocuments vwill dthen vbe jmuch jclearer.

  29. Dave: you owe me a new keyboard! There’s milk snorted all over this one.

  30. Dave: you owe me a new keyboard! There’s milk snorted all over this one.

  31. Chris says:

    Cyrus, the non-Hungarian equivilant of that example is code with one-letter variable names. Hungarian requires a descriptive prefix on your variables. You’re still supposed to append a meaningful suffix on the end. Otherwise you just get gibberishical alphabety-soup code like your example.

    The actual design of the code seems to have been optimized for illegibility, complicating the exercise.

  32. David says:

    I nominate Dave Aspinwall in the category of "best comment on Hungarian Notation EVER."

  33. DrPizza says:

    What’s even worse is that the hungarian idiocy extends to members of structs (though not always, as is MS’s way. It’s QuadPart and LowPart and HighPart in a LARGE_INTEGER, for example). So even if I don’t have any of that shit in _my_ code I still come across it, and have to access members as "dwXXXX" etc..

    Ugh.

  34. David Levine says:

    When I was writing code in C I loved in Hungarian – couldn’t live without it. When I was coding in C++ I liked it a lot. Now that I am coding in C# I hate it…the need for Hungarian is directly proportional to the quality of the tools I am using… Intellisense makes the need for Hungarian go almost completely away. Now it is only useful if I print out some code on dead trees, and usually not even then.

    Now Hungarian is a tool that is looking for a problem to solve. Improvements in tool technology has made it approach obsolesence.

    Give me Intellisense or give me…well, never mind.

  35. <p>I found a post on one of the MSDN blogs the other day about Hungarian Notation being adopted by some of the Microsoft Development teams, and how they should go about using it.</p><p>Where I work, we don’t use Hungarian Notation, and so far we have managed to cope without it. Having said that, the language that we use (Progress 4GL) is meant for database access, and therefore relatively few types actually exist, and more complex types are created by creating database tables. The most that we really stretch to is "vVarName" for a variable, "ipParamName"/"opParamName"/"iopParamName" for input/output/input-output parameters (respectively). Some people do go further and may use "c" for a string, but that is about it. We also call procedures or functions defined in include-files "i_ProcName" to make it easier to identify their location, and sometimes people may name variables inside of procedures located in include files so that they do not run into "Multiple Variable Declaration" problems.. So far, in the extensive project that I work on, it’s not been a problem.</p><p>The also to a page which describes an internal Microsoft layout format for programs/source code. Again, we don’t have a strict set of rules for this. We have rough guide lines (like "Line up all the equal signs") but quite often these are broken in order to actually improve readability.</p><p>What this actually means though, is that (by knowing each person’s individual style), it can be quite easy to pin-point who did a piece of work where it is not already obvious who had done the work.</p><p>I doubt that my Project Managers will ever feel the need to develop a more rigorous standard, or enforce rules about layout and style, and for the most part I don’t think that we will need to either.</p><p>We’ll just have to see what happens with time…..</p>

  36. Eric Lippert says:

    I wrote a bit about Hungarian in my blog.

    http://weblogs.asp.net/ericlippert/archive/2003/09/12/52989.aspx

    The commenter above is 100% correct — Hungarian as it is practiced today (ie, connoting storage) is nothing like the original purpose (semantic documentation).

    Hungarian is incredibly useful for the small number of scenarios that it was designed to handle. For instance, many years ago I rewrote all of the win16/win32 ASCII/ANSI/DBCS/Unicode string libraries in VBScript so that every variable actually had its correct Hungarian prefix. Every count of bytes was a cb, every index by character was ich regardless of whether they were 1, 2 or DBCS characters, etc.

    I found SO MANY BUGS that way. Bugs that would have taken our testers weeks or months of painstaking work on far east operating systems to find.

    But Hungarian is actively harmful when its telling you stuff that you can see in the declaration.

  37. DrPizza says:

    "I found SO MANY BUGS that way"

    Bet you’d have found even more with some actual strong typing.

  38. Eric Lippert says:

    To address your actual point, Hungarian isn’t really necessary in managed code. In unmanaged code the problem arises because it’s unclear whether a number is a count or an index or a maximum or a limit or what. And it’s sometimes unclear how indirected a pointer is. But in managed code, it’s pretty clear that name.Length is the length of a string, and there are seldom pointers to worry about. So what’s Hungarian for?

  39. DrPizza says:

    If you think that you need to be running managed code to remove the need for hungarian type annotations you really ought to learn some languages other than C.

  40. Brandon says:

    IMO, VS 2005’s Code Definition Window effectively replaces any benifits of hungarian notation. I try to use human readable, descripitive variables names and let the IDE help me with type lookups.

  41. Joku says:

    David has the point here, the right direction is assisting the developer and these notations should only be used when it’s the last choice.

    Meaningful class and property names and more smarts to the editor.