What’s Up With Hungarian Notation?

I mentioned Hungarian Notation in my last post — a topic of ongoing religious controversy amongst COM developers. Some people swear by it, some people swear about it.

The anti-Hungarian argument usually goes something like this:

“What is the point of having these ugly, hard-to-read prefixes in my code which tell me the type? I already know the type because of the declaration! If I need to change the type from, say, unsigned to signed integer, I need to go and change every place I use the variable in my code. The benefit of being able to glance at the name and know the declaring type is not worth the maintenance headache.”

For a long time I was mystified by this argument, because that’s not how I use Hungarian at all. Eventually I discovered that there are two completely contradictory philosophical approaches to Hungarian Notation. Unfortunately, each can be considered “definitive”, and the bad one is in widespread use.

The one I’ll call “the sensible philosophy” is the one actually espoused by Charles Simonyi in his original article. Here’s a quote from Simonyi’s paper:

The basic idea is to name all quantities by their types. […] the concept of “type” in this context is determined by the set of operations that can be applied to a quantity. The test for type equivalence is simple: could the same set of operations be meaningfully applied to the quantities in questions? If so, the types are thought to be the same. If there are operations that apply to a quantity in exclusion of others, the type of the quantity is different. […] Note that the above definition of type […] is a superset of the more common definition, which takes only the quantity’s representation into account. Naturally, if the representations of x and y are different, there will exist some operations that could be applied to x but not y, or the reverse.

(Emphasis added.)

What Simonyi is saying here is that the point of Hungarian Notation is to extend the concept of “type” to encompass semantic information in addition to storage representation information.

There is another philosophy which I call “the pointless philosophy”. That’s the one espoused by Charles Petzold in “Programming Windows“. On page 51 of the fifth edition he says

Very simply, the variable name begins with a lowercase letter or letters that denote the data type of the variable.  For example […] the i prefix in iCmdShow stands for “integer”.

And that’s all! According to Petzold, Hungarian is for connoting the storage type of the variable.

All of the arguments raised by the anti-Hungarians (with the exception of “its ugly”) are arguments against the pointless philosophy! And I agree with them: that is in fact a pointless interpretation of Hungarian notation which is more trouble than it is worth.

But Simonyi’s original insight is extremely powerful! When I see a piece of code that says

iFoo = iBar + iBlah;

I know that there are a bunch of integers involved, but I don’t know the semantics of any of these. But if I see

cbFoo = cchBar + cbBlah;

then I know that there is a serious bug here! Someone is adding a count of bytes to a count of characters, which will break on any Unicode or DBCS platform. Hungarian is a concise notation for semantics like “count”, “index”, “upper bound”, and other common programming concepts.

In fact, back in 1996 I changed every variable name in the VBScript string library to have its proper Hungarian prefix. I found a considerable number of DBCS and Unicode bugs just by doing that, bugs which would have taken our testers weeks to find by trial and error.

By using the semantic approach rather than the storage approach we eliminate the anti-Hungarian arguments:

I already know the type because of the declaration!

No, the Hungarian prefix tells you the semantic usage, not the storage type. A cBar is a count of Bars whether the storage is a ushort or a long.

If I need to change the type from, say, unsigned to signed integer, I need to go and change every place I use the variable in my code.

Annotate the semantics, not the storage. If you change the semantics of a variable then you need to also change every place it is used!

The benefit of being able to glance at the name and know the declaring type is not worth the maintenance headache.

But the benefit of knowing that you will never accidentally assign indexes to counts, or add apples to oranges, is worth it in many situations.

UPDATE: Joel Spolsky has written a similar article: Making Wrong Code Look Wrong. Check it out!

Comments (22)

  1. dave sanderman says:

    another comment about ‘I already know the type because of the declaration’ is: often the declaration and use are distant from each other. the benefit of using hungarian in that case is that you don’t have to look at a line of code and then grovel around to find out how the vars are declared.

  2. Eric Lippert says:

    Though you make a good point, I’d counter that by saying that (a) many developers now use some pretty sophisticated editors that can find the declaration very quickly, and (b) I try to keep my routines under three screens long. If its a local variable, the type is nearby. If it’s a member variable or a global variable then sure, sometimes it is a pain to find the storage type, but really, how often do you care whether that counter is a UINT or a DWORD?

  3. ZorbaTHut says:

    I’ve always thought that it would be handy to make that sort of thing a compile-time error. Describe the set of prefixes you plan to use and how they can be associated with each other, then make the compiler scream at you if you do something Wrong.

  4. Mike Dimmick says:

    There’s always the Ada way of doing things: creating new integer types, and new subtypes. However, I suspect that most of us don’t have the patience for that. As far as I’m aware, there aren’t any mainstream languages apart from Ada that allow us to declare the exact valid range of an integer variable, or differentiate apples from oranges within the type system (i.e. with support from the compiler).

    There’s less need for Hungarian prefixes in a strong type system like Ada’s, or the user-defined type systems of C++ – because the compiler can tell you you’re making mistakes.

    However, if you have conversion operators and alternate constructors in C++, there’s a chance of introducing type errors.

  5. Ed Ball says:

    My primary argument for Hungarian has to do with making code easier to read. Even if the declaration of a local/member/global variable is just a screen away, that’s still too far if I’m trying to quickly figure out this line of code where the debugger dropped me.

    My philosophy for Hungarian notation actually falls between your "sensible" and "pointless" philosophies. I don’t look to Hungarian to tell me whether an integer is 16- or 32-bit, signed or unsigned, but I also don’t look to it to tell me much semantic information — I figure the actual variable name is good for that. I do want Hungarian to tell me that the variable is an integer (versus a real number, or a Boolean, or a rectangle, or a pen, or whatever). So "nChildren" is probably the number of children, whereas "bChildren" is probably true if there are any children, and "astrChildren" is probably an array of their names. It communicates enough of the type to know what the basic operations on the variable are, and, combined with the variable name, gives solid semantic information.

  6. Tim Scarfe says:

    As no-one has commented on this yet, I will note that it there is no argument against using it in a language like JScript where there are variants involved. It makes the code 10 times easier to read and look pretty cool to boot :)

  7. Dave Anderson says:

    << …there is no argument against using it in a language like JScript where there are variants involved. It makes the code 10 times easier to read and look pretty cool to boot… >>

    Ugh. JScript is the WORST kind of language in which to use Hungarian notation, as there is no way to enforce typing.

    And I disagree with claims that Hungarian Notation makes code easier to read.

  8. I lovr hungarian for all the right reasons. I agree wiht the blooger’s comments, but you’ve got to start somewhere…

    http://CodeInsight.com/Docs/Programming Standards.doc

  9. Jeff Clark says:

    If you do use hungarian it also seems like you need to setup some standards so everyone uses the SAME prefixes (however you plan on using then). If you have multiple people using different prefixes for the same thing then you still end up having to go look at the definition because you don’t know what stupid thing is. Also we’d always run into problems with people manufacturing prefixes for classes they create so you end up with wpbmObject or something indecipherable like that.

    I used to use it but we hit so many problems due to people making up their own rules that we don’t use it anymore. At first I was pretty skeptical about not using it but as long as you take care and name things appropriately it’s not really missed. Of course if people named things appropriately in the first place then using it would probably not have caused as many problems as it did.

  10. Chris says:

    Hungarian doesn’t have any place in production code.  It goes beyond "messy" or "dirty".  Most variable names imply their data type.  ID is int, Name is string, so on.  For complex data types, you get into trouble when people start using their own abbreviations.  Class People does it abbriviate to pplPerson? or plePerson, what if there’s a pointer ppple?  Also most IDEs nowadays(especially VS) provide abundant hover over which tells you data type, among other things.

    Redundancy is bad.

  11. Bruno says:

    Hungarian notation is only useful if you name your variables things like "blah" and "foo". If you give your variables good, descriptive names, this is a complete non-issue.

  12. Bruce Leggett says:

    Hungarian Notation is psychologically inefficient and counter productive. First, you must learn all the prefixes (this could be a burdensome task in some cases)! Finally, after you "Hungarianize" your code now when you maintain it you have to decode and disect those "Hungarianized" variables. Both are an unecessary mental blow and loss in productivity. Use Descriptive names as Bruno stated. Period.

  13. Robert Brown says:

        As an embedded systems programmer (C), I find Hungarian notation to be extremely beneficial.  By "embedded" I mean true embedded programming where an oscilloscope is often used to debug software.  Hungarian is a fabulously quick way to assess whether you’re mixing fixed-point math with floating point math.  Further, it allows a very quick way to determine if you’re trying to do mixed integer math between longs and ints (for example), which can cause rollover problems.  When implementing software filters that need to run fast (fixed-point) yet need to be reconciled with human-readable data (floating point), Hungarian is a really powerful way to ensure your math makes sense.  One glance tells you if you’re trying to right-shift a float (oops).  Different descriptive variable names in the embedded world often become too cumbersome: iVoltage and fVoltage (Hungarian) is a lot more readable than VoltageFromADconverter and VoltageForUserInterface.  Hungarian notation has helped me catch dangerous bugs before software deployment (for example, erasing on-chip flash memory 65535 times instead of just once when mixing longs and ints…given that flash devices can do somewhere around 10k- 100k write cycles before they die, you can see the value in catching this type of bug quickly!).

        That being said, I can completely understand not using Hungarian for more abstracted languages like Java, C#, etc.  The lines between atomic data types are blurred with these languages (which is good for these languages because it makes for easier, faster programming), so type-checking has less serious consequences.

        I think it comes down to applying Hungarian where it makes sense.  I never devise long cryptic Hungarian prefixes for complex data structures (for such cases, the name is descriptive enough).  Hungarian really helps to prevent type-mixing during mixed math, and to specify scope for improved readability.  In any given system, I often will use a platform with Hungarian (C in the embedded processor) and without Hungarian (C# in the PC-based user interface software).

     Just use the right tool for the job if it helps.

  14. I was bored this weekend so I ended up trawling through a bunch of blog archives and came across posts

  15. To celebrate the 5th aniversary of this post…

    If a variable stores the count of something, put ‘count’ in its name. That way, you get a more readable version of the semantic indicator and no one can accuse you of being Hungarian. Modern IDEs solve the problem of typing long names (auto completion) and also of looking up the representation (Find declaration…)

  16. theBoringCoder says:

    Wow…what I learned Hungarian Notation is…years ago in college…was based on a definition similar to Petzold's…which is wrong.  No wonder I never understood it's purpose.

  17. theBoringCoder says:

    True Hungarian Notation is only useless to those of us who develop in modern tools where variable names are allowed to be long–like 30 or 40 characters.  If you were coding with tools where your variable names needed to be…let's say…16 characters or less…I can see the usefulness.