Hungarian Notation Is Clearly (Good|Bad)

A commenter asked:

thread_local X tlsX; ??

Herb, I hope you aren’t backtracking on Hungarian Notation now that you work for Microsoft. Say it aint so…

It ain’t so. Besides, Microsoft’s Framework Developer’s Guide prominently intones: “Do not use Hungarian notation.”

Warts like “tls” and “i” are about lifetime and usage, not type. Here “tls” denotes that each thread gets its own copy of the value that is constructed and destroyed once per thread that uses it (lifetime) and doesn’t need to be synchronized using mutexes (usage). As another example of usage warts, I’ll also use “i” for a variable that’s used as an index or for iteration — and given that the “i” convention goes back to before BASIC, we shouldn’t try to pin the Hungarian tail on that donkey.

Having said that, though, many people have variously decried and defended different forms of Hungarian, and you may notice a pattern… they’re mostly:

Today, “Hungarian” nearly always means Systems Hungarian. The main trouble with Systems Hungarian comes from trying to embed information about a variable’s type into the variable’s name by prepending an encoded wart like the venerable sz, pach, ul, and their ilk. Although potentially helpful in a weakly-typed language like C, that’s known to be brittle and the prefixes tend to turn into lies as variable types morph during maintenance. The warting systems also don’t extend well to user-defined types and templates.

I’ve railed against the limitations and perils of Hungarian with Andrei Alexandrescu in our book C++ Coding Standards, where I made sure it was stated right up front as part of “Item 0: Don’t sweat the small stuff (or, Know what not to standardize)”:

Example 3: Hungarian notation. Notations that incorporate type information in variable names have mixed utility in type-unsafe languages (notably C), are possible but have no benefits (only drawbacks) in object-oriented languages, and are impossible in generic programming. Therefore, no C++ coding standard should require Hungarian notation, though a C++ coding standard might legitimately choose to ban it.

and with Jim Hyslop in our article “Hungarian wartHogs”:

“… the compiler already knows much more than you do about an object’s type. Changing the variable’s name to embed type information adds little value and is in fact brittle. And if there ever was reason to use some Hungarian notation in C-style languages, which is debatable, there certainly remains no value when using type-safe languages.”

There’s an amusing real-world note later in that article. I’ll pick up where the Guru is saying:

“I recall only one time when Hungarian notation was useful on a project. … One of the programmers on the project was named Paul,” the Guru explained. “Several months into the project, while still struggling to grow a ponytail and build his report-writing module, he pointed out that Hungarian notation had helped him find a sense of identity, for he now knew what he was…” She paused.

I blinked. It took me about ten seconds, and then I shut my eyes and grimaced painfully. “Pointer to array of unsigned long,” I groaned.

She smiled, enjoying my pain. “True story,” she said.

It is indeed a true story. I worked with him in 1993. The joke was bad then, too, but at least using Hungarian was more defensible because the project’s code was written in C. I found it handy at the time, but now that I work in more strongly typed languages I find the type-embedding version of Hungarian more actively harmful than actually useful.

You might call me a Hungarian emigrant, now living happily as an expat out in eastern C++ near the Isle of Managed Languages.

28 thoughts on “Hungarian Notation Is Clearly (Good|Bad)

  1. It’s hard to beat the conciseness of Hungarian for tagging “size” variables – “cch” for a count of characters, “cb” for a count of bytes, “cc” for a count of array elements. In general, Hungarian is only really useful for value types/POD, where polymorphism is extremely unlikely and ambiguity in semantics almost always inevitable

  2. There may be another reason to use Hungarian notation. If you are writing MFC code which is already hungarian, you need to use some convention which looks consistent with the library code. This does not hold true for STL like libraries as we are users of the STL code most of the time rather than extenders.

  3. Given your argument, is this a good place to request the type trait is_thread_local be added to N2659, presuming N2659 appears in C++0x? A partially specialised mutex template using that type trait could select best mutex use.

  4. In general I use the pData notation for pointers, and this seems very logic to me. Noone else writes this, too?

  5. mlcr, why would that help? If you type -> instead of . the compiler will tell you, most stuff just won’t compile if you use a pointer as a reference/value or vice versa. It’s not about being logical, it’s about not adding info that doesn’t contribute anything but might get out of sync, ending up as a liability instead.

  6. mlcr and Peirz, I also generally use pData notation for pointers in C/C++. I find it helps both the writer and reader. There’s a significant difference between “if (size == 0)” and “if (pSize == 0)”, for example.

    It also doesn’t suffer as many of the downsides of other prefixes. If a property changes from being a pointer, the use of the property changes in most cases and needs to be updated. It’s actually useful in this case as it helps ensure each use is considered.

  7. The point is that the IDE will tell you stuff when you are writing, but not when you are reading. Since code is read more than written, it’s just much clear to use pData or DataPtr or whatever.

  8. “the compiler already knows much more than you do about an object’s type”. Irrelevant. Variable names are for the code reader, not just the compiler. “Changing the variable’s name to embed type information adds little value and is in fact brittle.” It’s only brittle if you change the type often. How often do you change a variable’s type, and how much does it cost to do a find/replace?

    Most people read code more often than they write it. The IDE is not always available (and it’s certainly not quick, if we’re talking about Visual Studio). The variable declaration is often far away from its usage. Having the type embedded in the name saves people from having to dig around for the declaration.

  9. People who love Hungarian, let me ask you something. Have you tried programming without it? Have you used any managed languages like C# or Java. Have you used Hungarian in those languages? No? Why?

    If you think C++ type system is less strict than in C# or Java than you’ve been programming in C rather than C++. And I understand your affection for Hungarian.

  10. In response to Alex: I use m_… and m_p… to indicate a member variable/pointer and p… to indicate a global or parmeter pointer. It makes reading and understanding the code easier I find, because it gives a clue about the lifetime of an object and it’s scope. Java doesn’t need the m_… because everything is a member of a class and no p.. because there are no pointers. I work on a large project that traditionally used MS’s hungarian notation and I can bear witness that it can be very confusing if a variable used to be a m_rsData (RoguewaveString) and has subsequently been changed to a Pointer to and object without its name having been altered.

  11. I personally *used* Hungarian back in the days, coming from C, in C++. But my experience is, it makes the code less readable (bBool, szString; pszString, and so on) and for a fact after time they start to lie. Yes, when you change the type, the developer has to check all uses, but there will be developers who will fall back on the baddest programming tool existing (copy/replace [All]), that much for careful change.
    None of us is such a copy/paste programmer, right, but we all know some who are …

    As far as the m_.. is concerned, this is for indicating the scope, not the type. Member variable is different from local variable, etc..
    So it is a different thing, it is not Hungarian notation.
    As for the JAVA remark someone made above, m_ also applies there, because you can also have local(stack) variables (and yes, the ‘p’ will not happen).

    One suggestion for the remark of ‘the declaration of the variable can be far away from it’s point of usage : In an agile manner you try to have very small functions/methods (a few lines), and in C++ you always declare you variable at the point of the first usage [C : back in the days you had to declare at the start of the function/block], as such in a small function the declaration will never be far away. As for member variables, yes it is in another file. But even then the class is probably not too big (it’s state) and things can be found quickly, otherwise it might well be possible extra abstractions are possible and the class is doing way to much work.

    Oh yes : I don’t use Hungarian notation anymore ;-)

  12. Class *members*:
    mName; // any member (minimum)
    * mpName; // any ptr (minimum)

    static msName; // any static member (minimum)
    static * mspName; // any static ptr (minimum)

    Function, method *arguments*:
    aName; // any arg (minimum)
    * apName; // any ptr arg (minimum)
    & arName; // any ref arg (minimum)
    const akName; // any const arg
    const * akpName; // any const ptr arg
    const & akrName; // any const ref arg

    Local *variables*: a, ab, abc..Name, pName, vName, vpName
    (“a, ab” mean any one or two lowercase letters.
    “abc..[Name]” means all lc or camelback not conflicting with
    function args and class data members)

  13. BTW, Regarding my first post, it appears multiple occurrences of the word “type” enclosed by less than and greater than signs got deleted, thereby rendering my post meaningless.

  14. about the m_name notation:
    I find it extremely silly to re-implement the wheel this way.
    If the class is too big to easily see that some variable is a member variable, than you can always use the built in notation of: this.var or this->var (depends on language, of course).

    about IDEs:
    IDEs are not just useful for writing code. A proper IDE actually parses your code and understands it (using an AST to represent your code).
    such a tool can help you read code not only by highlighting it (regular text editors can do that too) but also with showing you tool tips with the documentation, allowing you to click an identifier and go to its definition, showing you a variable’s type in a popup, etc…
    If you use such a tool than it sure worth the additional startup delay to get all the convenience.

  15. As for arguments that Hungarian notation helps when reading code, if you can’t see that house->Build means that house is a pointer to an object then you should not be allowed to edit that code. From my expierence, developers should spend more time worrying about giving their variables more appriopriate names than encoding type information in to the variable name as this is what truly helps when reading code.
    Your functions/methods should do one thing and do that one thing well and be typically small. Therefore, the only far away variable declarations will be those that are member variables in the header file so it will be clear that that’s not a local variable declaration. And, the editor helps in this case as you can right click and go to definition. Also, you’ve got class view. (Assuming that you are using VC++ ide of course).

  16. “… the compiler already knows much more than you do about an object’s type. […]”

    Come on guys.. We all know that the notation is not for the compiler. For the compiler it’s totally irrelevant how you call your variables.
    These notations are for the developers and maintainers of the code. Consider the following example:
    if (somevar) { dosomething();}
    and compare it with the following examples:
    if (bSomeVar) {dosomething();}
    if (pSomeVar) { dosomething();}

    The main advantage of the Hungarian notation is that in these cases (and potentionally in other cases as well) will makes the code significantly more readable.

    I personally use a simplified version of the Hungarian notation, b for boolean, p for pointers, s for strings (why would you use the extra z if nowadays nobody uses the pascal format?) and n for any integer type (again, why to distinguish between them?). And that’s it.

    I have seen people heavily arguing against the Hungarian notation although they use prefixes like ‘is’ (e.g.: isActive). The b prefix clearly expresses the same thing and it’s half as long ;)

  17. Well said rH.

    I invite any of you to come work on a horribly coded, non-standardized application which I have been maintaining daily for the past 2 and a half years. If you can tell me what information “MyVar” holds without having to waste 30 – 60 minutes sifting through a 5,000 line document then i bow at your feet good sir. Yes, you can ctrl+f to search it’s meaning, I grant you that, but if that variable is used more than a few times it consumes more time than is necessary when shorthand could have given you a rather large clue within half a second. It’s especially horrible when a variable is ambiguously used several times in one document for multiple purposes.

    Despite the horrible general naming convention of “MyVar”, it will still assist me immensly if the variable were simply prepended with “s”. At least then I would know it’s a string type, or that it’s supposed to be anyhow.

    For the record, I’ll also state that Hungarian is meant to hold semantic information about the type, thereby indicating the variable’s purpose – it is not meant to set in stone that this is what it will be used for – only an indication – the fact that you are changing variable type during your app and not reflecting such changes within your writing style is entirely inconsequential to the point, and entirely your own fault :)

    Use it, don’t use it – it makes no difference to me unless I happen to read your code.. then i’ll probably strangle you for wasting my time unless you employ some other method which prevents me from having to search every line of your document for the meaning of “CheeseAndCrackers”.

  18. Exactly!! Everyone who is against the hungarian notation write their code and disappear coz they can sift through their code after 6 months. Then people like us have the misfortune of maintaining their code while they blog about how the IDE knows more about their variables then themselves.

    Please try to work on an engineering software with ints/doubles/floats/longs flying around in a 100 line function and then try to figure out why the result is off by the third decimal place after all the unit conversions take place. Dont tell me you should not be doing this and that coz i didnt. I just need to fix the problem and get out of the mess as fast as i can.

    Also the human brain can only remember 7 things at a time unlike the compiler so if you have a dozen variables flying around in a function doing magical things be prepared to keep right clicking and “Go to definition” a few hundred times.

  19. So, in short, don’t use “Systems Hungarian” because the C++ compiler is type safe.

    But what about trying to distinguish a Windows code-page string, a UTF-8 string, a UCS2 string, and a UTF-16 string? Imagine how useful it would be to do this:

    //utf-8: each char takes 1 to 4 bytes of space.
    std::string str8LastName;
    //Windows code page.
    std::string strLastName;
    //utf-16: each char is 2 to 4 bytes.
    std::wstring str6LastName;
    //UCS2: all chars exactly 2 bytes in size.
    std::wstring str2LastName;

    And Unicode aside, the compiler also does not know whether or not a char* is null-terminated. In short, it doesn’t take long to compile a list of noticeably valuable “App Hungarian” to recommend and follow.

  20. Kind of an old topic but I did want to comment on Mark’s post.
    “If you can tell me what information “MyVar” holds without having to waste 30 – 60 minutes sifting through a 5,000 line document then i bow at your feet good sir.”

    If you are naming variables “MyVar” then you shouldn’t be designing at all.

  21. “If you are naming variables “MyVar” then you shouldn’t be designing at all.”

    You understand that he’s not designing the code, right? He’s working on it and having to deal with nonsense names. Hungarian isn’t the nonsense part of the name. Hungarian notation should NOT be considered a replacement for variable names that make sense.

    When looking at the code as a whole, Hungarian notation can make a world of difference in keeping you from having to mouse over a particular variable when looking at the document as a whole. I’m not saying that Hungarian is the be all end all, but there are many situations in which it makes sense.

  22. “You” means whomever wrote it.

    So mouse over is not acceptable in today’s ide’s for those reviewing code? If it takes you 30-60 mins to find out what a variable is/does then you as the reviewer should find something more suitable as a career.

    Please provide a situation that “makes sense” to where hungarian notation is useful in today’s coding environment? Reading it is not a proper example. I can find what a variable is/does in less than 10 seconds with an find tool, which btw any good editor has and most have a go to definition.

  23. like I just said, “when looking at the document as a whole” having to mouse over every variable to understand what it is capable of, what properties it has, etc… wastes time.

    For those of us that are only able to keep track of part of the different variable types on a page at a time, this process of checking types becomes very time consuming.

    I come from a C coding back ground (not C++). It made sense to use Hungarian and its ilk to keep track of these things, not just because “go to definition”, coding standards limiting the amount of code that gets put in a class, and intellisense didn’t exist. When first approaching code for maintenance, it’s nice to not have to go through the definitions for all of the variables of a class. That takes a lot of time. If I have a convention that displays that information for me in a way that makes sense, it tends to be a bit of a time saver.

    (as stated above) “I’m not saying that Hungarian is the be all end” At the end of the day, readability is the only thing that matters to a coder when talking about variable names. Just like when some people prefer 2+2=4, some other wierdos (most developers) wouldn’t cringe at 4=2+2 at all. It’s about readability and making sure that people can understand the code as quickly as possible. If you have a way to do that, use it. I think most people understand Hungarian and can gain some utility out of it. I’ll use that.

  24. Once again all you are doing is repeating your self with “reading” the code. So you are telling me hungarian notation keeps you from having to find out the origins of the variable? I bet most of the time you go to definition to find out what the variable is or mouse over that variable.

    I am going to kill this argument because you have not provided an example outside of reading and plenty of others have provided examples where it causes the code to be brittle.

    Welcome to the land of managed code, we don’t care that you worked with C, its time to grow into the languages you are working with.

  25. The problem with this argument is that, as with comments, when you are maintaining code you did not write, especially “horribly coded, non-standardized” code – relying on the NAME of the variable to tell you its type is akin to buying a rolex from someone off the street in Vegas…

    Ultimately, for hungarian notation (of any type) to be useful requires a great deal of discipline and agreement on the programming team – after 15 years as a professional developer, I have yet to find a team larger than 2 that can pull it off.

Comments are closed.