Hugarian notation - it's my turn now :)

06/22/2004

Following on the heals of Eric Lippert’s posts on Hungarian and of course Rory Blyth’s classic “Die, Hungarian notation… Just *die*”, I figured I’d toss my hat into the fray (what the heck, I haven’t had a good controversial post in a while).

One thing to keep in mind about Hungarian is that there are two totally different Hungarian implementations out there.

The first one, which is the one that most people know about, is “Systems Hungarian”. System’s Hungarian is also “Hungarian-as-interpreted-by-Scott-Ludwig” (Edit: For Scott's side of this comment, see here - the truth is better than my original post). In many ways, it’s a bastardization of “real” (or Apps) Hungarian as proposed by Charles Simonyi.

Both variants of Hungarian have two things in common. The first is the concept of a type-related prefix, and the second is a suffix (although the Systems Hungarian doesn’t use the suffix much (if at all)). But that’s where the big difference lies.

In Systems Hungarian, the prefix for a type is almost always related to the underlying data type. So a parameter to a Systems Hungarian function might be “dwNumberOfBytes” – the “dw” prefix indicates that the type of the parameter is a DWORD, and the “name” of the parameter is “NumberOfBytes”. In Apps Hungarian, the prefix is related to the USE of the data. The same parameter in Apps Hungarian is “cb” – the “c” prefix indicates that the parameter is a type, the “b” suffix indicates that it’s a byte parameter.

Now consider what happens if the parameter is the number of characters in a string. In Systems Hungarian, the parameter might be “iMaxLength”. It might be “cchWideChar”. There’s no consistency between different APIs that use Systems Hungarian. But in Apps Hungarian, there is only one way of representing the parameter; the parameter would be “cch” – the “c” prefix again indicates a count, the “ch” type indicates that it’s a character.

Now please note that most developers won’t use “cch” or “cb” as parameters to their routines in Apps Hungarian. Let’s consider the Win32 lstrcpyn function:

  LPTSTR lstrcpyn(     
     LPTSTR lpString1,
    LPCTSTR lpString2,
    int iMaxLength
);

This is the version in Systems Hungarian. Now, the same function in Apps Hungarian:

  LPTSTR Szstrcpyn(     
     LPTSTR szDest,
    LPCTSTR szSrc,
    int cbLen
);

Let’s consider the differences. First off, the name of the function changed to reflect the type returned by the function – since it returns an LPTSTR, which is a variant of a string, the function name changed to “SzXxx”. Second, the first two parameters name changed. Instead of “lpString1” and “lpString2”, they changed to the more descriptive “szSrc” and “szDest”. The “sz” prefix indicates that the variable is a null terminated string. The “Src” and “Dest” are standard suffixes, which indicate the “source” and “destination” of the operation. The iMaxLength parameter which indicates the number of bytes to copy is changed to cbLen – the “cb” prefix indicates that it’s a count of bytes, the standard “Len” suffix indicates that it’s a length to be copied.

The interesting thing that happens when you convert from Systems Hungarian to Apps Hungarian is that now the usage of all the parameters of the function becomes immediately clear to the user. Instead of the parameter name indicating the type (which is almost always uninteresting), the parameter name now contains indications of the usage of the parameter.

The bottom line is that when you’re criticizing Hungarian, you need to understand which Hungarian you’re really complaining about. Hungarian as defined by Simonyi isn’t nearly as bad as some have made it out to be.

This is not to say that Apps Hungarian was without issue. The original Hungarian specification was written by Doug Klunder in 1988. One of the things that was missing from that document was a discussion about the difference between “type” and “intent” when defining prefixes. This can be a source of a great confusion when defining parameters in Hungarian. For example, if you have a routine that takes a pointer to a “foo” parameter to the routine, and internally the routine treats the parameter as single pointer to a foo, it’s clear that the parameter name should be “pfoo”. However, if the routine treats the parameter as an array of foo’s, the original document was not clear about what should happen – should the parameter be “pfoo” or “rgfoo”. Which wins, intent or type? To me, there’s no argument, it should be intent, but there have been some heated debates about this over the years. The current Apps Hungarian document is quite clear about this, intent wins.

One other issue with the original document was that it predated C++. So concepts like classes weren’t really covered and everyone had to come up with their own standard. At this point those issues have been resolved. Classes don’t have a “C” prefix, since a class is really just a type. Members have “m_” prefixes before their actual name. There are a bunch of other standard conventions but they’re relatively unimportant.

I used Hungarian exclusively when I was in the Exchange team; my boss was rather a Hungarian zealot and he insisted that we code in strict Apps Hungarian. Originally I chafed at it, having always assumed that Hungarian was stupid, but after using it for a couple of months, I started to see how it worked. It certainly made more sense than the Hungarian I saw in the Systems division. I even got to the point where I could understand what an irgch would without even flinching.

Now, having said all that, I don’t use Hungarian these days. I’m back in the systems division, and I’m using a home-brewed coding convention that’s based on the CLR standards, with some modifications I came up with myself (local variables are camel cased, parameters are Pascal cased (to allow easy differentiation between parameters and local variables), class members start with _ as a prefix, globals are g_Xxx). So far, it’s working for me.

I’ve drunk the kool-aid from both sides of the Hungarian debate though, and I’m perfectly happy working in either camp.

Comments

Anonymous
June 22, 2004
The comment has been removed
Anonymous
June 22, 2004
Your points are absolutely valid. But Rory's "classic" made me laugh, which is always a good thing.

You're absolutely right his point was to bash bad hungarian and throw the baby out with the dishtowel (purposely mixing metaphors).

I'm hoping to write about the negatives in Hungarian sometime in the future (maybe tomorrow), they can be quite significant actually, which is why I don't code in it these days. Especially with Systems Hungarian
where the Hungarian represents the type not the intent.

What happens with Systems Hungarian when you decide to change a signed long to an unsigned long? You need to rename your variables from l to ul. Also, what's the thing about the difference between dw and ul? Why do you differentiate? Apps Hungarian actually forbids the use of dw because it's a compiler/hardware specific type.
Anonymous
June 22, 2004
I've seen this growing fashion for using an underscore prefix for member variables. Sutter's "Exceptional C++" uses it extensively, for example.

Frankly, I don't like it. As I understand it, the C/C++ standards state that identifiers beginning with an underscore are reserved for the use of the language/library implementer.

Thus, by using an underscore prefix, you're explicitly allowing your implementation to break your code -- by defining macros with underscores, for example.

Do you have any comments about this?
Anonymous
June 22, 2004
An interesting point. I'm trying to avoid the m_Xxx thingy because it's too "mfc-ish" for my tastes, do you have an alternative suggestion?
Anonymous
June 22, 2004
The comment has been removed
Anonymous
June 22, 2004
"What happens with Systems Hungarian when you decide to change a signed long to an unsigned long? You need to rename your variables from l to ul."

Most every text editor supports a "find and replace" function that easily lets me change the variable names. Only Visual Studio allows me to hover over a var name and find out the type (which doesn't always work and sometimes requires a restart to make it work) or lets me right-click on a var/method and "go to definition" (the best trivial IDE function I've ever seen)
Anonymous
June 22, 2004
The comment has been removed
Anonymous
June 22, 2004
The C# coding guidelines are intended for public classes, which is why they don't differentiate between fields and non-fields - in fact if you run fxcop, it complains about having public fields.

I agree that m_ and g_ are the only carryovers, I may start using them again :)
Anonymous
June 22, 2004
I have to admit, that go to definition is the best feature since the pop up of an objects methods and properties.
Anonymous
June 22, 2004
My copy of Exceptional C++ uses an underscore for data members, but as a suffix, not prefix, which is ok as far as the standard is concerned.

I have also seen (and used) a convention where instance data members start with “its” (itsName) and class data members start with “their” (theirCount).

Although, in C++ on Win32 it’s hard to use any naming conventions, with STL using lowercase wor almost everything, and Win32 using Pascal case for functions and all caps for data types. (I once worked with a guy who consistently named (non-POD) classes with all caps… I almost heard the code scream every time I looked at it!)
Anonymous
June 22, 2004
> As I understand it, the C/C++ standards state that identifiers beginning with an underscore are reserved for the use of the language/library implementer.

"In addition to the names documented in this manual, reserved names include all external identifiers (global functions and variables) that begin with an underscore (_) and all identifiers regardless of use that begin with either two underscores or an underscore followed by a capital letter are reserved names. This is so that the library and header files can define functions, variables, and macros for internal purposes without risk of conflict with names in user programs."

http://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html

So, a member name beginning with a underscore and a lowercase letter would be safe, since it's not a global symbol (and macros would use two underscores or a underscore and a uppercase letter).

Of course, you also have to avoid all the hundreds of function and variable names defined in the standards, because any of them can be implemented as a macro (for instance, errno). Some other prefixes and suffixes are reserved too; see the page above for details (no source code, so probably Safe For Microsoft Employees).

I think there's no need to be that pedantic; if you can't compile something, it means you have the source code for that something, and can edit it to remove any incompatibilities. Of course, I avoid naming my types something_t, since I know it's reserved, but I don't check every name I create with the manual or the standards.
Anonymous
June 22, 2004
Actually in the Exchange Store, the classes are all in all caps (OFOLD, OMSG, etc). They're used internally as all lowercase (pofoldFoo, etc).
Anonymous
June 22, 2004
I use F as prefix for data members and A as prefix for parameters. It's something that's left over from Borland's TV/OWL/VCL coding standards (which have been close to the same since the very early 90's). I don't like m_, g_ or anything else with an underscore in it. There's really no need for an underscore unless you can't find the shift-key on your keyboard and write everything in lower or UPPER case, neither of which I want to see in my code.
Anonymous
June 22, 2004
The main problem is it makes names too long and too similar to other name's shape. They then need to be read rather than recognised (fluent humans recognise word shapes - illiteratacy they try to spell the word - it takes too long - they forget the previous words and so can't extract meaning from a sentence [as those who can't read at all is very small numbers of illiterate people)).

So in short. It sounds like a good idea but not for humans. I'll also speak for the cat, it can't make heads or tails of it either.
Anonymous
June 22, 2004
I second the comment about go to definition / xref browsing being the best reason to use an IDE over just text editors. I honestly have no idea how people can debug / code in large source bases with out them. :P

Personally, I use "the" as my prefix for member variables -- although this is mostly because I hate typing underscores (otherwise I'd probably use a leading underscore). Either way, I agree with the gist of the article that Apps hungarian is FAR more sensible than System hungarian. Again, though, I'm coming from the stance that I read code in my debugger which gives me right click type information, goto def, and xref browsing etc. YMMV.
Anonymous
June 22, 2004
The comment has been removed
Anonymous
June 22, 2004
The comment has been removed
Anonymous
June 22, 2004
"My copy of Exceptional C++ uses an underscore for data members, but as a suffix, not prefix"

Typical, the one page that I choose to base my complaint on (Item 20) seems to be the only page in the entire book to use an underscore prefix :-)

I'll concede the point about leading underscores only being reserved for global identifiers, though.
Anonymous
June 22, 2004
Hungarian's useful in situations where the type system isn't strong enough to express your intent fully. In C, practically anything can be promoted to anything else with no casts, and strings and arrays aren't first-class types (or even types at all in the case of strings) so you need some way for the programmer to be able to see whether the operations performed are in fact correct - because the compiler can't help you.

However, there's little need to be warty when using user-defined types (structs) except around the implicit conversion between any pointer type and void*.

In C++, there's even less need because there is no implicit conversion to void* - you must use a cast. An C++ program will tend to have less typeswitching and peculiar casts due to the use of polymorphism.

The ultimate in static type systems still has to be Ada, in which you can define new integer types that don't have implicit conversions between themselves, the built-in Integer type, or any other integer types, and you can also define range-restrictions of types (keeping the implicit conversions of its parent type). The main problems with Ada are that its interop with C requires writing declarations, its syntax (derived from or inspired by Pascal) is verbose (due to a requirement to be largely LL(1) parsable) and that the object-oriented extensions of Ada 95 aren't. That is, you type Fn( obj, arg1, arg2 ) rather than obj.Fn( arg1, arg2 ).

In Ada, warting is completely unnecessary. Instead you should define new types. It doesn't completely prevent errors (there was a space project where a lander completely missed its target because the programmers were working in traditional units while the scientists were working in metric) but it can be helpful.

Turning to more practical languages <g> in C# it's also largely unnecessary to wart, although the index/count difference can be necessary. If you're programming in VB.NET, no warts are required if you turn on Option Strict.

(to wart: to decorate your variables and parameters with the type; warts: the decorations themselves)
Anonymous
June 22, 2004
The comment has been removed
Anonymous
June 23, 2004
"The first one, which is the one that most people know about, is “Systems Hungarian”. System’s Hungarian is also “Hungarian-as-interpreted-by-Scott-Ludwig”. In many ways, it’s a bastardization of “real” (or Apps) Hungarian as proposed by Charles Simonyi."

Hi Larry. Now I'm famous :). The story is that the hungarian bastardization originally came from the documentation folks. In the systems group we originally produced raw documentation for them that had standard "apps like" hungarian. They decided it was too obtuse for documentation so they did some serious readability changes to it. They are not programmers so this wasn't a graceful operation. This had a huge secondary effect because new programmers in the systems group would read the documentation and "more or less" reproduce that "documentation group" style. Not to mention books were written about the api that referenced that style. Pretty soon we had more code in this "docs group" style than in any other style.

A smaller effect came from Win32 birthing. Many new kernel32 apis were created and what you see in those apis is MarkL's personal interpretation of what he read as the style in the documentation. Many "CountOfBytes" instead of "cb", "IndexOfX" instead of ix, etc.

Good to hear you are still cranking away. I hope all is going well with you.
Anonymous
June 23, 2004
Wow! Thanks Scott for the clarification.

I apologize for taking your name in vain, btw, I didn't have contact info for you so I didn't check it with you. I REALLY appreciate the clarification.

And yes, I'm still cranking away here, I'm over in multimedia land (talk about strange journeys) but I'm still having fun. It's scary, I hit 20 years in 2 months.
Anonymous
June 23, 2004
Dude, that was NINE months ago. How is this possibly following on the heels? Just how big do you think my heels are?

:-)
Anonymous
June 23, 2004
One of the best arguments for HN in the "old" days was to reduce the need to PageUp to see variable declarations. Two (unrelated?) factors that render this less necessary/cumbersome are (1) large, hi-res displays that let the developer see more lines - I can easily see 60-80 LOC on my screen right now; and (2) "improved" programming practices/languages leading to smaller, more cohesive routines, where definition and usage are extremely close.

Where the definition of a variable cannot be local (forms controls are the most obvious) I still feel the need to see some qualifier for ease-of-comprehension. That said, these days I prefer UserNameTextBox to txtUserName...
Anonymous
June 23, 2004
Larry,

Thanks to you, I learned today that i've been 'doing' Apps Hungarian forever. I didn't even know it had a name ;-)

My personal rules are pretty much consistent with Simon's Lite rules (Although I have a few more ones).

I tend to specify the type only if it has some importance e.g. I have a WORD variable, there's most likely a reason why it's not simply an int. So people who read the code should better be aware of it.

But in most cases, my Hungarian prefixes are just abbreviations for common words that should otherwise appear in the variable name. e.g. I truly hate nNumberOfBytes. cBytes does a way better job IMHO.
Anonymous
April 23, 2007
I was bored this weekend so I ended up trawling through a bunch of blog archives and came across posts
Anonymous
June 28, 2007
PingBack from http://www.electricmonk.nl/log/2005/05/16/joel-on-software-linkdump/
Anonymous
April 28, 2008
PingBack from http://dukelupus.wordpress.com/2008/04/29/muutujate-nimetamine-kige-raskem-osa-programmeerimisest/
Anonymous
August 12, 2008
PingBack from http://inside.echobit.net/dreijer/archives/2008/08/12/reflections-on-hungarian-notation/
Anonymous
January 21, 2009
PingBack from http://www.keyongtech.com/2150858-which-syntax-is-better-in/2
Anonymous
May 31, 2009
PingBack from http://outdoorceilingfansite.info/story.php?id=5162
Anonymous
June 01, 2009
PingBack from http://woodtvstand.info/story.php?id=3560
Anonymous
June 02, 2009
PingBack from http://portablegreenhousesite.info/story.php?id=31834
Anonymous
June 08, 2009
PingBack from http://hairgrowthproducts.info/story.php?id=5121
Anonymous
June 18, 2009
PingBack from http://outdoordecoration.info/story.php?id=1692

Microsoft Ignite

Share via

Comments

Ask Learn

Share via

Hugarian notation - it's my turn now :)

Comments

Additional resources