Hugarian notation – it’s my turn now :)



Following on the heals of Eric Lippert’s posts on Hungarian and of course Rory Blyth’s classic “Die, Hungarian notation… Just *die*”, I figured I’d toss my hat into the fray (what the heck, I haven’t had a good controversial post in a while).


One thing to keep in mind about Hungarian is that there are two totally different Hungarian implementations out there.


The first one, which is the one that most people know about, is “Systems Hungarian”.  System’s Hungarian is also “Hungarian-as-interpreted-by-Scott-Ludwig” (Edit: For Scott’s side of this comment, see here – the truth is better than my original post).  In many ways, it’s a bastardization of “real” (or Apps) Hungarian as proposed by Charles Simonyi. 


Both variants of Hungarian have two things in common.  The first is the concept of a type-related prefix, and the second is a suffix (although the Systems Hungarian doesn’t use the suffix much (if at all)).  But that’s where the big difference lies.


In Systems Hungarian, the prefix for a type is almost always related to the underlying data type.  So a parameter to a Systems Hungarian function might be “dwNumberOfBytes” – the “dw” prefix indicates that the type of the parameter is a DWORD, and the “name” of the parameter is “NumberOfBytes”.  In Apps Hungarian, the prefix is related to the USE of the data.  The same parameter in Apps Hungarian is “cb” – the “c” prefix indicates that the parameter is a type, the “b” suffix indicates that it’s a byte parameter.


Now consider what happens if the parameter is the number of characters in a string.  In Systems Hungarian, the parameter might be “iMaxLength”.  It might be “cchWideChar”.  There’s no consistency between different APIs that use Systems Hungarian.  But in Apps Hungarian, there is only one way of representing the parameter; the parameter would be “cch” – the “c” prefix again indicates a count, the “ch” type indicates that it’s a character.


Now please note that most developers won’t use “cch” or “cb” as parameters to their routines in Apps Hungarian.  Let’s consider the Win32 lstrcpyn function:

 LPTSTR lstrcpyn(     
LPTSTR lpString1,
LPCTSTR lpString2,
int iMaxLength
);

This is the version in Systems Hungarian.  Now, the same function in Apps Hungarian:

 LPTSTR Szstrcpyn(     
LPTSTR szDest,
LPCTSTR szSrc,
int cbLen
);

Let’s consider the differences.  First off, the name of the function changed to reflect the type returned by the function – since it returns an LPTSTR, which is a variant of a string, the function name changed to “SzXxx”.  Second, the first two parameters name changed.  Instead of “lpString1” and “lpString2”, they changed to the more descriptive “szSrc” and “szDest”.  The “sz” prefix indicates that the variable is a null terminated string.  The “Src” and “Dest” are standard suffixes, which indicate the “source” and “destination” of the operation.  The iMaxLength parameter which indicates the number of bytes to copy is changed to cbLen – the “cb” prefix indicates that it’s a count of bytes, the standard “Len” suffix indicates that it’s a length to be copied.


The interesting thing that happens when you convert from Systems Hungarian to Apps Hungarian is that now the usage of all the parameters of the function becomes immediately clear to the user.  Instead of the parameter name indicating the type (which is almost always uninteresting), the parameter name now contains indications of the usage of the parameter.


The bottom line is that when you’re criticizing Hungarian, you need to understand which Hungarian you’re really complaining about.  Hungarian as defined by Simonyi isn’t nearly as bad as some have made it out to be.


This is not to say that Apps Hungarian was without issue.  The original Hungarian specification was written by Doug Klunder in 1988.  One of the things that was missing from that document was a discussion about the difference between “type” and “intent” when defining prefixes.  This can be a source of a great confusion when defining parameters in Hungarian.  For example, if you have a routine that takes a pointer to a “foo” parameter to the routine, and internally the routine treats the parameter as single pointer to a foo, it’s clear that the parameter name should be “pfoo”.  However, if the routine treats the parameter as an array of foo’s, the original document was not clear about what should happen – should the parameter be “pfoo” or “rgfoo”.  Which wins, intent or type?  To me, there’s no argument, it should be intent, but there have been some heated debates about this over the years.  The current Apps Hungarian document is quite clear about this, intent wins.


One other issue with the original document was that it predated C++.  So concepts like classes weren’t really covered and everyone had to come up with their own standard.  At this point those issues have been resolved.  Classes don’t have a “C” prefix, since a class is really just a type.  Members have “m_” prefixes before their actual name.  There are a bunch of other standard conventions but they’re relatively unimportant.


I used Hungarian exclusively when I was in the Exchange team; my boss was rather a Hungarian zealot and he insisted that we code in strict Apps Hungarian.  Originally I chafed at it, having always assumed that Hungarian was stupid, but after using it for a couple of months, I started to see how it worked.  It certainly made more sense than the Hungarian I saw in the Systems division.  I even got to the point where I could understand what an irgch would without even flinching.


Now, having said all that, I don’t use Hungarian these days.  I’m back in the systems division, and I’m using a home-brewed coding convention that’s based on the CLR standards, with some modifications I came up with myself (local variables are camel cased, parameters are Pascal cased (to allow easy differentiation between parameters and local variables), class members start with _ as a prefix, globals are g_Xxx).  So far, it’s working for me.


I’ve drunk the kool-aid from both sides of the Hungarian debate though, and I’m perfectly happy working in either camp.


 

Comments (35)

  1. Mike Dunn says:

    How is that blog entry a "classic"? He’s belittling people who use Hungarian improperly, not talking about Hungarian itself.

    I personally HATE reading code w/o proper Hungarian. As Eric Lippert has been saying lately, reading code is harder than writing code. When I read code w/o Hungarian, I always find myself paging up to find the type of a variable, then paging back down and (after finding my place) resuming reading the code.

    Proper prefixes also keep the developer honest about matching types (signed vs. unsigned; MBCS vs. Unicode strings, etc.) which helps prevent bugs related to mismatched types. Sure, the compiler MIGHT warn you about them, if you have the warning level high enough… but we all know people who ignore (or worse, turn off) warnings, and besides anything that helps find bugs earlier is a Good Thing in my book.

    Rory also asks why use Hungarian "in this day and age of the superpowered IDE". Well, you don’t always have your IDE to hold your hand. I often skim code in an 80-column 4NT window because it’s faster than using VC.

    In any case, it’s another case of someone bashing bad Hungarian and then condeming Hungarian altogether, without recognizing the benefits that good Hungarian brings.

  2. Your points are absolutely valid. But Rory’s "classic" made me laugh, which is always a good thing.

    You’re absolutely right his point was to bash bad hungarian and throw the baby out with the dishtowel (purposely mixing metaphors).

    I’m hoping to write about the negatives in Hungarian sometime in the future (maybe tomorrow), they can be quite significant actually, which is why I don’t code in it these days. Especially with Systems Hungarian

    where the Hungarian represents the type not the intent.

    What happens with Systems Hungarian when you decide to change a signed long to an unsigned long? You need to rename your variables from l to ul. Also, what’s the thing about the difference between dw and ul? Why do you differentiate? Apps Hungarian actually forbids the use of dw because it’s a compiler/hardware specific type.

  3. I’ve seen this growing fashion for using an underscore prefix for member variables. Sutter’s "Exceptional C++" uses it extensively, for example.

    Frankly, I don’t like it. As I understand it, the C/C++ standards state that identifiers beginning with an underscore are reserved for the use of the language/library implementer.

    Thus, by using an underscore prefix, you’re explicitly allowing your implementation to break your code — by defining macros with underscores, for example.

    Do you have any comments about this?

  4. An interesting point. I’m trying to avoid the m_Xxx thingy because it’s too "mfc-ish" for my tastes, do you have an alternative suggestion?

  5. Scott says:

    Yeah, I don’t think you can really hold any Rory Blythe post up as a "classic" except as a "classic Rory Blythe post".

    People who make the argument that you should do away with a programming notation because of the IDE you are using aren’t very good programmers. If you are leaning on the IDE too much, you don’t understand the language/framework you are using well enough.

    People who say Hungarian notation makes the code harder for them to read have a point. It may be harder for THEM to read the code. I knew a guy once who couldn’t make heads or tails of SQL stored procedures unless it was indented with line breaks a certain way. He had the same problem with variable declarations, they all had to be on a separate line.

    I still use those old VB/VBA (I don’t remember where they came from) guidelines for naming controls on your form. e.g. btnGo instead of just go. Say you are designing a winform and you have a text box for the name and a label for the "name" textbox. I’d name the label "lblName" and the textbox "txtName". Which is "pointless Hungarian" (as Eric L so named it). Given that in ASP.NET the form markup where the control is declared and the code behind that responds to the control are separate, it makes sense to me to use some kind of prefix (or suffix like "nameTextBox" except that I hate typing THAT much just for a variable name) to tell me what type the control is. Plus it allows me to semantically group controls together that are related. Say, for a search control that allows the user to input some text and then choose whether or not to search the web or just "this site" using a drop down. The three controls I would need are a Button, TextBox, and a DropDownList. btnSearch, txtSearch, ddlSearch (or cboSearch). How does Apps Hungarian handle widgets?

  6. Scott says:

    "What happens with Systems Hungarian when you decide to change a signed long to an unsigned long? You need to rename your variables from l to ul."

    Most every text editor supports a "find and replace" function that easily lets me change the variable names. Only Visual Studio allows me to hover over a var name and find out the type (which doesn’t always work and sometimes requires a restart to make it work) or lets me right-click on a var/method and "go to definition" (the best trivial IDE function I’ve ever seen)

  7. No, I don’t have an alternative solution. Since my day job still requires me to use (and abuse) MFC, I tend to stick with m_.

    Is there any real reason (other than that it’s an MFC-ism) that you don’t like the m_?

    I’m starting to do a bit of C# coding these days, and I’m trying to stick with the coding guidelines (from MSDN), which seem to imply that member variable names should look likeThis.

    Frankly, I miss my m_.

    I started using Hungarian naming religiously back in the days of Windows 3.0, but I’ve mellowed to the point that I think the only useful prefixes are m_ and g_, to signify scope, rather than type.

  8. The C# coding guidelines are intended for public classes, which is why they don’t differentiate between fields and non-fields – in fact if you run fxcop, it complains about having public fields.

    I agree that m_ and g_ are the only carryovers, I may start using them again 🙂

  9. Don Newman says:

    I have to admit, that go to definition is the best feature since the pop up of an objects methods and properties.

  10. Centaur says:

    My copy of Exceptional C++ uses an underscore for data members, but as a suffix, not prefix, which is ok as far as the standard is concerned.

    I have also seen (and used) a convention where instance data members start with “its” (itsName) and class data members start with “their” (theirCount).

    Although, in C++ on Win32 it’s hard to use any naming conventions, with STL using lowercase wor almost everything, and Win32 using Pascal case for functions and all caps for data types. (I once worked with a guy who consistently named (non-POD) classes with all caps… I almost heard the code scream every time I looked at it!)

  11. Cesar Eduardo Barros says:

    > As I understand it, the C/C++ standards state that identifiers beginning with an underscore are reserved for the use of the language/library implementer.

    "In addition to the names documented in this manual, reserved names include all external identifiers (global functions and variables) that begin with an underscore (_) and all identifiers regardless of use that begin with either two underscores or an underscore followed by a capital letter are reserved names. This is so that the library and header files can define functions, variables, and macros for internal purposes without risk of conflict with names in user programs."

    http://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html

    So, a member name beginning with a underscore and a lowercase letter would be safe, since it’s not a global symbol (and macros would use two underscores or a underscore and a uppercase letter).

    Of course, you also have to avoid all the hundreds of function and variable names defined in the standards, because any of them can be implemented as a macro (for instance, errno). Some other prefixes and suffixes are reserved too; see the page above for details (no source code, so probably Safe For Microsoft Employees).

    I think there’s no need to be that pedantic; if you can’t compile something, it means you have the source code for that something, and can edit it to remove any incompatibilities. Of course, I avoid naming my types something_t, since I know it’s reserved, but I don’t check every name I create with the manual or the standards.

  12. Actually in the Exchange Store, the classes are all in all caps (OFOLD, OMSG, etc). They’re used internally as all lowercase (pofoldFoo, etc).

  13. Sebastian says:

    I use F as prefix for data members and A as prefix for parameters. It’s something that’s left over from Borland’s TV/OWL/VCL coding standards (which have been close to the same since the very early 90’s). I don’t like m_, g_ or anything else with an underscore in it. There’s really no need for an underscore unless you can’t find the shift-key on your keyboard and write everything in lower or UPPER case, neither of which I want to see in my code.

  14. David Candy says:

    The main problem is it makes names too long and too similar to other name’s shape. They then need to be read rather than recognised (fluent humans recognise word shapes – illiteratacy they try to spell the word – it takes too long – they forget the previous words and so can’t extract meaning from a sentence [as those who can’t read at all is very small numbers of illiterate people)).

    So in short. It sounds like a good idea but not for humans. I’ll also speak for the cat, it can’t make heads or tails of it either.

  15. Steven C. says:

    I second the comment about go to definition / xref browsing being the best reason to use an IDE over just text editors. I honestly have no idea how people can debug / code in large source bases with out them. 😛

    Personally, I use "the" as my prefix for member variables — although this is mostly because I hate typing underscores (otherwise I’d probably use a leading underscore). Either way, I agree with the gist of the article that Apps hungarian is FAR more sensible than System hungarian. Again, though, I’m coming from the stance that I read code in my debugger which gives me right click type information, goto def, and xref browsing etc. YMMV.

  16. Simon Cooke [exMSFT] says:

    Hungarian Notation Lite®

    Since bitching (or otherwise) about Hungarian notation appears to be a common past-time right now, I thought I’d shove my oar in and deliver my 2 cents…

    http://www.accidentalscientist.com/2004/06/hungarian-notation-lite.html

  17. Everyone nowadays likes to throw away tried and true practices because… well, they can.

    Hungarian has its uses in a bunch of cases where the type system lacks information.

    int *Foo;

    is that a pointer to an int or an array/vector of ints? It’s clear in these cases:

    int *prgnElements;

    int **prgprgiCurrentPositions;

    int nElements;

    differentiating between pointers to singletons and arrays is very useful. Differentiating whether something is in an index or a count is useful. Differentiating between a count of bytes and a count of characters is useful.

    But I guess a bunch of people from other companies didn’t invent it so we have to throw away the good with the bad.

    The "apps hungarian" tyrrany was stupid. Having to scroll up and down constantly to try to find the nature of an identifier is also stupid.

  18. "My copy of Exceptional C++ uses an underscore for data members, but as a suffix, not prefix"

    Typical, the one page that I choose to base my complaint on (Item 20) seems to be the only page in the entire book to use an underscore prefix 🙂

    I’ll concede the point about leading underscores only being reserved for global identifiers, though.

  19. Mike Dimmick says:

    Hungarian’s useful in situations where the type system isn’t strong enough to express your intent fully. In C, practically anything can be promoted to anything else with no casts, and strings and arrays aren’t first-class types (or even types at all in the case of strings) so you need some way for the programmer to be able to see whether the operations performed are in fact correct – because the compiler can’t help you.

    However, there’s little need to be warty when using user-defined types (structs) except around the implicit conversion between any pointer type and void*.

    In C++, there’s even less need because there is no implicit conversion to void* – you must use a cast. An C++ program will tend to have less typeswitching and peculiar casts due to the use of polymorphism.

    The ultimate in static type systems still has to be Ada, in which you can define new integer types that don’t have implicit conversions between themselves, the built-in Integer type, or any other integer types, and you can also define range-restrictions of types (keeping the implicit conversions of its parent type). The main problems with Ada are that its interop with C requires writing declarations, its syntax (derived from or inspired by Pascal) is verbose (due to a requirement to be largely LL(1) parsable) and that the object-oriented extensions of Ada 95 aren’t. That is, you type Fn( obj, arg1, arg2 ) rather than obj.Fn( arg1, arg2 ).

    In Ada, warting is completely unnecessary. Instead you should define new types. It doesn’t completely prevent errors (there was a space project where a lander completely missed its target because the programmers were working in traditional units while the scientists were working in metric) but it can be helpful.

    Turning to more practical languages <g> in C# it’s also largely unnecessary to wart, although the index/count difference can be necessary. If you’re programming in VB.NET, no warts are required if you turn on Option Strict.

    (to wart: to decorate your variables and parameters with the type; warts: the decorations themselves)

  20. Florian says:

    "What happens with Systems Hungarian when you decide to change a signed long to an unsigned long? You need to rename your variables from l to ul."

    Exactly. Which is precisely the reason that I encode type in variable names in my personal notation which seems to be similar to System Hungarian. If I change the type of a variable I need to check every line where that variable is used if I accidently introduced hidden bugs by the type change. And I want the compiler to complain loudely should I have missed an occurence, which it will do if the name of the variable did also change. As Mike mentioned, in C/C++ the change of type is not necessarily enough to get a compiler error or warning.

    The problem is that people who argue that Hungarian notation, especially System, is not needed anymore tend to leave out "if you code with MSVS or a similar IDE in C#, VB or Java for the Wintel plattform". If you code in C/C++ for Embedded, probably with an editor without MS gadgets, its a whole different story and other rules apply, IMHO.

  21. Scott Ludwig says:

    "The first one, which is the one that most people know about, is “Systems Hungarian”.  System’s Hungarian is also “Hungarian-as-interpreted-by-Scott-Ludwig”.  In many ways, it’s a bastardization of “real” (or Apps) Hungarian as proposed by Charles Simonyi."

    Hi Larry. Now I’m famous :). The story is that the hungarian bastardization originally came from the documentation folks. In the systems group we originally produced raw documentation for them that had standard "apps like" hungarian. They decided it was too obtuse for documentation so they did some serious readability changes to it. They are not programmers so this wasn’t a graceful operation. This had a *huge* secondary effect because new programmers in the systems group would read the documentation and "more or less" reproduce that "documentation group" style. Not to mention books were written about the api that referenced that style. Pretty soon we had more code in this "docs group" style than in any other style.

    A smaller effect came from Win32 birthing. Many new kernel32 apis were created and what you see in those apis is MarkL’s personal interpretation of what he read as the style in the documentation. Many "CountOfBytes" instead of "cb", "IndexOfX" instead of ix, etc.

    Good to hear you are still cranking away. I hope all is going well with you.

  22. Wow! Thanks Scott for the clarification.

    I apologize for taking your name in vain, btw, I didn’t have contact info for you so I didn’t check it with you. I REALLY appreciate the clarification.

    And yes, I’m still cranking away here, I’m over in multimedia land (talk about strange journeys) but I’m still having fun. It’s scary, I hit 20 years in 2 months.

  23. Eric Lippert says:

    Dude, that was NINE months ago. How is this _possibly_ following on the heels? Just how big do you think my heels are?

    🙂

  24. Mike Woodhouse says:

    One of the best arguments for HN in the "old" days was to reduce the need to PageUp to see variable declarations. Two (unrelated?) factors that render this less necessary/cumbersome are (1) large, hi-res displays that let the developer see more lines – I can easily see 60-80 LOC on my screen right now; and (2) "improved" programming practices/languages leading to smaller, more cohesive routines, where definition and usage are extremely close.

    Where the definition of a variable cannot be local (forms controls are the most obvious) I still feel the need to see some qualifier for ease-of-comprehension. That said, these days I prefer UserNameTextBox to txtUserName…

  25. Serge Wautier says:

    Larry,

    Thanks to you, I learned today that i’ve been ‘doing’ Apps Hungarian forever. I didn’t even know it had a name 😉

    My personal rules are pretty much consistent with Simon’s Lite rules (Although I have a few more ones).

    I tend to specify the type only if it has some importance e.g. I have a WORD variable, there’s most likely a reason why it’s not simply an int. So people who read the code should better be aware of it.

    But in most cases, my Hungarian prefixes are just abbreviations for common words that should otherwise appear in the variable name. e.g. I truly hate nNumberOfBytes. cBytes does a way better job IMHO.

  26. I was bored this weekend so I ended up trawling through a bunch of blog archives and came across posts