Type Inference vs. Static/Dynamic Typing

Jeff Atwood just wrote a nice piece on why type inference is convenient, using a C# sample:

I was absolutely thrilled to be able to refactor this code:

StringBuilder sb = new StringBuilder(256);
UTF8Encoding e = new UTF8Encoding();
MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider();

Into this:

var sb = new StringBuilder(256);
var e = new UTF8Encoding();
var md5 = new MD5CryptoServiceProvider();

It’s not dynamic typing, per se; C# is still very much a statically typed language. It’s more of a compiler trick, a baby step toward a world of Static Typing Where Possible, and Dynamic Typing When Needed.

It’s worth making a stronger demarcation among:

  • type inference, which you can do in any language
  • static vs. dynamic typing, which is completely orthogonal but all too often confused with inference
  • strong vs. weak typing, which is mostly orthogonal (e.g., C is statically typed because every variable has a statically known actual type, but also weakly typed because of its casts)

Above, Jeff explicitly separates inference and dynamic-ness. Unfortunately, later on he proceeds to imply that inference is a small step toward dynamic typing, which is stylistically true in principle but might mislead some readers into thinking inference has something to do with dynamic-ness, which it doesn’t.

Type Inference

Many languages, including C# (as shown above) and the next C++ standard (C++0x, shown below), provide type inference. C++0x does it via the repurposed auto keyword. For example, say you have an object m of type map<int,list<string>>, and you want to create an iterator to it:

map<int,list<string>>::iterator i = m.begin(); // type is required in today’s C++, allowed in C++0x
auto i = m.begin(); // type can be inferred in C++0x

How many times have you said to your compiler, “Compiler, you know the type already, why are you making me repeat it?!” Even the IDE can tell you what the type is when you hover over an expression.

Well, in C++0x you won’t have to any more, which is often niftily convenient. This gets increasingly important as we don’t want to, or can’t, write out the type ourselves, because we have:

  • types with more complicated names
  • types without names (or hard-to-find names)
  • types held most conveniently via an indirection

In particular, consider that C++0x lambda functions generate a function object whose type you generally can’t spell, so if you want to hold that function object and don’t have auto then you generally have to use an indirection:

function<void(void)> f = [] { DoSomething(); };
auto f = [] { DoSomething(); };
// hold via a wrapper — requires indirection
// infer the type and bind directly

Note that the last line above is more efficient than the C equivalent using a pointer to function, because C++ lets you inline everything. For more on this, see Item 46 in Scott Meyers’ Effective STL on why it’s preferable to use function objects rather than functions, because (counterintuitively) they’re more efficient.

Now, though there’s no question auto and var are great, there are some minor limitations. In particular, you may not want the exact type, but another type that can be converted to:

map<int,list<string>>::const_iterator ci = m.begin(); // ci’s type is map<int,list<string>>::const_iterator
auto i = m.begin(); // i’s type is map<int,list<string>>::iterator
Widget* w = new Widget();
const Widget* cw = new Widget();
WidgetBase* wb = new Widget();
shared_ptr<Widget> spw( new Widget() );
// w’s type is Widget*
// cw’s type is const Widget*
// wb’s type is WidgetBase*
// spw’s type is shared_ptr<Widget>
auto w = new Widget(); // w’s type is Widget*

So C++0x auto (like C# var) only gets you the most obvious type. Still and all, that does cover a lot of the cases.

The important thing to note in all of the above examples is that, regardless how you spell it, every variable has a clear, unambiguous, well-known and predictable static type. C++0x auto and C# var are purely notational conveniences that save us from having to spell it out in many cases, but the variable still has one fixed and static type.

Static and Dynamic Typing

As Jeff correctly noted in the above-quoted part, this isn’t dynamic typing, which permits the same variable to actually have different types at different points in its lifetime. Unfortunately, he goes on to say the following that could be mistaken by some readers to imply otherwise:

You might even say implicit variable typing is a gateway drug to more dynamically typed languages.

I know Jeff knows what he’s talking about because he said it correctly earlier in the same post, but let’s be clear: Inference doesn’t have anything to do with dynamic typing. Jeff is just noting that inference just happens to let you declare variables in a style that can be similar to the way you do it all the time in a dynamically typed language. (Before I could post this, I see that Lambda the Ultimate also commented on this confusion. At least one commenter noted that this could be equally viewed as a gateway drug to statically typed languages, because you can get the notational convenience without abandoning static typing.)

Quoting from Bjarne’s glossary:

dynamic type – the type of an object as determined at run-time; e.g. using dynamic_cast or typeid. Also known as most-derived type.

static type – the type of an object as known to the compiler based on its declaration. See also: dynamic type.

Let’s revisit an earlier C++ example again, which shows the difference between a variable’s static type and dynamic type:

WidgetBase* wb = new Widget();
if( dynamic_cast<Widget*>( wb ) ) { … }
// wb’s static type is WidgetBase*
// cast succeeds: wb’s dynamic type is Widget*

The static type of the variable says what interface it supports, so in this case wb allows you to access only the members of WidgetBase. The dynamic type of the variable is what the object being pointed to right now is.

In dynamically typed languages, however, variables don’t have a static type and you generally don’t have to mention the type. In many dynamic languages, you don’t even have to declare variables. For example:

// Python
x = 10;
x = “hello, world”;
// x’s type is int
// x’s type is str

Boost’s variant and any

There are two popular ways to get this effect in C++, even though the language remains statically typed. The first is Boost variant:

// C++ using Boost
variant< int, string > x;
x = 42;
x = “hello, world”;
x = new Widget();
// say what types are allowed
// now x holds an int
// now x holds a string
// error, not int or string

Unlike a union, a variant can include essentially any kind of type, but you have to say what the legal types are up front. You can even simulate getting overload resolution via boost::apply_visitor, which is checked statically (at compile time).

The second is Boost any:

// C++ using Boost
any x;
x = 42;
x = “hello, world”;
x = new Widget();

// now x holds an int
// now x holds a string
// now x holds a Widget*

Again unlike a union, an any can include essentially any kind of type. Unlike variant, however, any doesn’t make (or let) you say what the legal types are up front, which can be good or bad depending how relaxed you want your typing to be. Also, any doesn’t have a way to simulate overload resolution, and it always requires heap storage for the contained object.

Interestingly, this shows how C++ is well and firmly (and let’s not forget efficiently) on the path of Static Typing Where Possible, and Dynamic Typing When Needed.

Use variant when:

  • You want an object that holds a value of one of a specific set of types.
  • You want compile-time checked visitation.
  • You want the efficiency of stack-based storage where possible scheme (avoiding the overhead of dynamic allocation).
  • You can live with horrible error messages when you don’t type it exactly right.

Use any when:

  • You want the flexibility of having an object that can hold a value of virtually “any” type.
  • You want the flexibility of any_cast.
  • You want the no-throw exception safety guarantee for swap.

28 thoughts on “Type Inference vs. Static/Dynamic Typing

  1. Strong/weak typing is not a meaningful distinction because weak and strong are not defined. The property you are trying to express is “safe” vs. “unsafe”.

    C is not “weakly typed” because of its casts, it’s unsafe because of its casts. The casts allow one to subvert the type system, making the type system unsafe.

  2. Glen, here is a recent comp.lang.c++.moderated posting asking about the exact same thing as I am:

    http://groups.google.com/group/comp.lang.c++.moderated/msg/dca1e018039c79ff

    quote:
    —-
    Hi,
    Can anyone confirm if my observation is correct?
    According to N2673, I will not be able to write:
    class Parser
    {
    auto member = func();
    };
    but I will be able to write (according to the current Standard Draft):
    class Parser
    {
    decltype( func() ) member = func();
    };
    Which eventually will encourage me to write and use a macro:
    class Parser
    {
    DECLARE_AUTO_MEMBER( member, func() );
    };
    I have one particular usage in mind, which is the Boost.Spirit, where
    the grammar rules are implemented with class non-static members.

    —–

    –jeffk++

  3. I’m not sure I’d like your example to be legal myself, because if the type of the initializer of ‘i’ changed, the layout of the struct s would also change, something i’d prefer not to have happen quite so subtly. I also think something as core as a class declaration should be spelled out to improve code clarity. It’s fine for where it will primarily be used for though, such as in local variable definitions in functions. I imagine decltype might suit your desire more here.

  4. @Lee R:

    Your example:

    struct s { int i = 0; };

    will be legal in C++0x

    I would like for it to also allow:

    struct s { auto i = 0; };

    –jeffk++

  5. “This gets increasingly important as we don’t want to, or can’t, write out the type ourselves, because we have:


    * types without names (or hard-to-find names) ”

    I even know where those hard-to-find names (or more exactly – types) will be came from:

    auto i = …

    auto j = i;

    auto g = j;

    auto v = g;

    auto m = v;

    Grats! We are going to get one more Basic.
    Hope, I’m wrong.

  6. @Jeff Koftinoff

    The problem with your example is one of member initialisation and shouldn’t really be considered a limitation of ‘auto’. Consider the following (no use of auto, yet still illegal):

    struct s { int i = 0; };

  7. @Martin:

    Type inferencing doesn’t _give_ you duck typing, but it’s certainly compatible with it. Although statically-typed languages (I’m thinking ML or Haskell) tend to give you a different flavour of duck, all the more sophisticated static type systems have some way of getting the expressivity and flexibility you’d use duck typing for without sacrificing type safety.

    @Anocka:

    I don’t see how that goes beyond the “inference just happens to let you declare variables in a style that can be similar to the way you do it all the time in a dynamically typed language”, which immediately follows the quote you’re disputing.

  8. Great C++ Libraries include Boost, ACE and Loki. If you do scientific computing, there are Blitz++, MTL and POOMA.
    Matthew Wilson has STLSoft which complements STL. It is a good library but didn’t succeed because of poor documentation. It is described in his book Extended STL (the first volume has been published by Addison-Wesly, the Second Volume will come out in Sep).

  9. “Never heard of POCO, are there any other great C++ libraries besides Boost?”

    Yes, ACE.

  10. For some reason all comments in your examples are cut off at one point. Tried it in Firefox and IE but it looks the same. Is it just me or there is something wrong with this page?

    “Again unlike a union, a variant can include essentially any kind of type.” I think you wanted to say “any can include”, not variant.

    Never heard of POCO, are there any other great C++ libraries besides Boost?

  11. Is it true that the auto keyword in C++0x can’t be used for a class member variable, even though C++0x allows initialization of member variables outside of constructors?

    ie:

    class X
    {
    public:
    X() {}
    auto f = [](int x){ return x+2; };
    };

    int main()
    {
    X x;
    std::cout << X.f(5) << “\n”;
    return 0;
    }

    Of course this example is simplistic but there are numerous places where I’d love to be able to use this pattern with c++ expression template objects being directly within another object, and I don’t see why this pattern can’t be allowed.

    –jeffk++

  12. @ vu64:
    Let me just say that I did not claim superiority of any library. I obviously have my preferences, but that is a topic for some other thread. Here, I have only provided facts. Some time ago, I have shared the same facts with Kevlin Henney, the author of boost::any. He had no problem agreeing.

    And a couple more facts:

    boost::any has been ported to POCO
    Poco::DynamicAny design was inspired by boost::any. It was not meant as a replacement for any, but rather as a complement, providing “softness” on the value extraction side
    The downside of DynamicAny compared to boost::any is that it can not hold any value out-of-the-box (a holder specialization is needed)

    There’s an article that should come out soon in Overload systematically comparing boost::any and boost::lexical_cast with Poco::DynamicAny. Stay tuned.

  13. I don’t see why you criticize boost::any so hard. Boost is the most popular C++ library. Anyway, this is the first time I’ve heard about DynamicAny. If their library is superior, can they submit it to Boost to replace boost::any?

  14. Re this:

    ——————
    WidgetBase* wb = new Widget();

    if( dynamic_cast( wb ) ) { … }
    ——————

    Wouldn’t wb’s dynamic type already be Widget* before the dynamic_cast?

  15. Type inference in the ML family showed me that it’s really easy to get lost in your types. When you see a variable with type vector::iterator, you can at least unravel that in a methodical fashion. But if you have to backtrack through half a dozen different variable declarations to figure out the type of your variable, it’s a lot easier to get confused. IMO, the solution here is better tool support. The IDEs should be able to perform the same type inferencing to show you the variable types, and the error messages should be better. ML didn’t have a decent IDE or error messages, but C++ will. Also, “niftily”? We’re adverbializing adjectives now? Nuts, I must have justily missed the memo.

  16. “With boost’s “any”, you need to cast it to some concrete type before you can use it.”

    Actually, boost::any can only be cast back to the exact type it holds. It works well for storing different types in an STL container. However, on the value retrieval side, it is even more rigid than the C++ language itself.

    I would also disagree with Herb’s qualification of any_cast as flexible – boost::any is static and very rigid on the value extraction side. That’s where DynamicAny (see my reply above) comes handy.

  17. Thanks for the great post Herb. It’s good to see these distinctions getting explained.

    One of the key verbosity-reducers of dynamic languages is duck typing. With boost’s “any”, you need to cast it to some concrete type before you can use it. “any” seems more like Java’s Object than anything from a dynamically typed language.

    Type inference doesn’t give you duck typing, one of the key benefits of dynamic typing. It seems that, even with type inference, you can’t be “dynamic when needed” if that need is to call a method of a derived class, when all the compiler knows is that the object is of the base class.

  18. “Inference doesn’t have anything to do with dynamic typing.”
    Well, that’s not always true.

    In the ocaml object system, when a function calls the method m of type, say
    m : unit -> int

    on an object x, the inferred type of x is :
    x : int; … >

    Basically, the function accepts any object that has the m method. That’s close to the dynamic paradigm (except that the presence of the methods is checked at compile-time).

    Sadly, I don’t know of any other similar typing system…

  19. I think the big win in with the auto keyword in C++ will be in some forms of meta-programming. A few years back I was working on an expression template system for doing vector and matrix operations and some of the template types involved were hairy to say the least. The auto keyword would have been a great help at the time in reducing things like

    VecOp op = VecOp(op1, op2);

    to

    auto op = VecOp(op1, op2);

    On an unrelated note, it’s interesting to see the reaction to Jeff’s use of var in C# over at refector my code. The first 3 out of five comments are complaints about it. It’ll be interesting to see if var in C# gets the same reaction as closures are getting in Java.

  20. There is also Poco::DynamicAny, which conveniently combines capabilities of boost::any and boost::lexical_cast, so you can do things like this:

    DynamicAny any(“42”);
    int i = any; // i == 42
    any = 65536;
    std::string s = any; // s == “65536”
    char c = any; // too big, throws RangeException

    or this:

    std::cout << RecordSet(session, “SELECT * FROM Table”);

    All happening dynamically, behind the scene and (of course) in standard C++. See Poco::Data::Row and Poco::Data::RowFormatter sample for a more complete insight.

Comments are closed.