Reader Q&A: When will better JITs save managed code?

In the comments on last week’s interview, MichaelTK asked:

@Herb: You mentioned two things I don’t fully understand in your talk.

1) Why would C++ be a better choice for very large scale applications than NET/Java? I mean the zero abstraction penalty (which is more a JIT compiler issue and not intrinsically hardwired into C#) , okay, but besides that?

2) C++ really only has a few language features which actually let you write faster code in theory. In practice, JIT compilers are just not good enough, yet, to fully optimize on C++ pace and that’s one of the main reasons why C++ excels at efficiency.

No, the reasons go deeper than that. I’m actually giving a talk at Lang.NEXT on Wednesday which focuses exactly on the managed/native divide. I’ll post a link next week.

In the meantime, short answer: C++ and managed languages make different fundamental tradeoffs that opt for either performance or productivity when they are in tension.

Why does Microsoft not put effort into a static C++ like compiler for C#/NET, say in manner of NGen, so that C# actually has even the slightest chance of being competitive with C++?

Actually, Microsoft has been actively investing in that for over a decade. So have Java vendors. I expect those efforts to continue.

Otherwise, saying C++ is more efficient than C# is not a theoretical issue, but caused by bad JIT compilers.

This is a 199x/200x meme that’s hard to kill – “just wait for the next generation of (JIT or static) compilers and then managed languages will be as efficient.” Yes, I fully expect C# and Java compilers to keep improving – both JIT and NGEN-like static compilers. But no, they won’t erase the efficiency difference with native code, for two reasons.

First, JIT compilation isn’t the main issue. The root cause is much more fundamental: Managed languages made deliberate design tradeoffs to optimize for programmer productivity even when that was fundamentally in tension with, and at the expense of, performance efficiency. (This is the opposite of C++, which has added a lot of productivity-oriented features like auto and lambdas in the latest standard, but never at the expense of performance efficiency.) In particular, managed languages chose to incur costs even for programs that don’t need or use a given feature; the major examples are assumption/reliance on always-on or default-on garbage collection, a virtual machine runtime, and metadata. But there are other examples; for instance, managed apps are built around virtual functions as the default, whereas C++ apps are built around inlined functions as the default, and an ounce of inlining prevention is worth a pound of devirtualization optimization cure.

Second, even if JIT were the only big issue, a JIT can never be as good as a regular optimizing compiler because a JIT compiler is in the business of being fast, not in the business of generating optimal code. Yes, JITters can target the user’s actual hardware and theoretically take advantage of a specific instruction set and such, but at best that’s a theoretical advantage of NGEN approaches (specifically, installation-time compilation), not JIT, because a JIT has no time to take much advantage of that knowledge, or do much of anything besides translation and code gen.

More in the talk on Wednesday (for those who are at the conference) which will go live online next week… I’ll blog about it when it’s up.

58 thoughts on “Reader Q&A: When will better JITs save managed code?

  1. @Jon Harrop: Do you have any code samples where MSVC++ runs slower than managed?

    @Herb: It would be interesting to see code samples and benchmarks in C++ and C# which illustrate the points made in this article.

  2. Sure, many times there are complex expressions (perhaps including loops) performing computations which are parameterized by the parameters are known at compile-time. Numeric integration of a function that has no closed-form algebraic integral, parameterized on the limits/bounds would be an example. In C++, meta-programming (formerly with templates, now with `constexpr`) can cause those to be precomputed at compile time. JITs won’t compute those at compile time, and in fact there would be no advantage to doing so, because the compile time cost is paid for every execution. In a JITted environment, you might as well just use memoization to ensure the computation is done at most once per execution.

  3. “This is a 199x/200x meme that’s hard to kill…a JIT can never be as good as a regular optimizing compiler because a JIT compiler is in the business of being fast, not in the business of generating optimal code”

    In my experience, the variance between different C++ compilers is much greater than between C++ and managed languages. For example, I have found that MSVC++ usually generates poor code that runs slower than most managed languages. The Clang C++ compiler usually generates fast code but it uses the same backend (LLVM) that managed languages use. So I don’t believe your claim.

    Do you have any concrete examples of optimizations done by C++ compilers that are not done by JITs because compile times would be unacceptable?

  4. Is the much mentioned LMAX the exception that proves the rule?

    It shows that if you use a tiny, non-dynamic (as in not dynamically allocating) subset of a language you can avoid nasty stalls. That is, by carefully avoiding the “default-on garbage collection” that Herb listed.

    C’s malloc too can stall; its easy to forget that.

    You’d use the same trick in C and C++ except you’d avoid indirection and boxing on the elements on the circular buffer because in C and C++ arrays of structs have adjacent memory and all the cache goodness that comes from that.

    (CLR with its arrays of `struct` would be that bit faster than JVM, perhaps?)

    Herb is saying that native code wins because the time a programmer is prepared to wait for a compilation is longer and affords more crunching than the time a JIT can actually spend at runtime.

    We know that PGO pays well for statically natively compiled apps yet we know that running a profile-gathering build is slightly slower; if you are always gathering statistics, are you always hurting slightly and stopping yourself from reaching optimal runtime efficiency? And to what extend does the measuring alter the actual profile of the code itself?

    But Herb, your point is more about the tradeoffs and product purpose of the mainstream JITs than about the technical absolutes, right?

  5. @Tony Arcieri

    Tony wrote: “This is exactly how LMAX works on top of the JVM. They preallocate all resources and store them inside of lock-free ring buffers as part of a framework they call Disruptor.”
    Tony, that optimizations appears to work fine in certain scenarios where I bet the memory and CPU power is plentiful.
    Many projects are written by average developers that churn code without thinking much about performance or post build optimizations.
    I suspect that LMAX comes short when moved from powerful servers to desktop and worse on mobile devices.
    FYI, this article: http://www.codeproject.com/Articles/92812/Benchmark-start-up-and-system-performance-for-Net
    shows that for the user experience, other factors than raw performance can come into play, like start-up time and power consumption.

  6. @Ben Voigt: OK, you got me, I’d forgotten that it also did a strstr & replace for a pair of embedded magic constants.

    As for generality, the generator function worked for any method type you passed it by copying from a template function.

  7. @bcs: That’s not very general, and I’m sure you had to write an object pointer in there somewhere, not just copy existing code. Without an object pointer, you aren’t actually solving the problem I mentioned.

    @Doug: Of course C++ can benefit from limited JIT compilation also, not all cases require extensive metadata.

  8. @bulldozer00: Metadata generally means information about types and is used for things like reflection. This adds significant space overhead to libraries/assemblies. In .NET, metadata is typically several times the size of the actual code instructions in an assembly (“assembly” means approximately library or DLL). For a C++ library/DLL, the extra housekeeping information is typically a fraction of the amount of code instructions. Metadata is the main reason why managed assemblies tend to be large.

    @Ziad: I just gave a talk yesterday that covered this and will be online in a week or so; I’ll post a link when it’s up. In the meantime, note that “managed” and “native” mean different things to different people, so I generally prefer to stick with more precise things like comparing “Java/.NET” and “C++”. For example, there was a rousing discussion yesterday about whether Go is a managed language; people argue about it because “managed” is not a crisp term with a consistent definition, so my view these days is that if the term is confusing then it’s not useful and it’s best to not use it — “the only way to win is not to play” and use crisp terminology instead.

  9. To mis-quote Pilate: “What is JIT compilation?”

    I can’t find it now, but a few months back I wrote a thunk template where the only thing it did at run time to generate each thunk was memcpy machine code into a new buffer. It required zero ASM or ABI knowledge to implement. (It did however make some rather nieave assumptions about the length of the function it copied.) If you want to call that JIT compilation than I can’t stop you but I will beg to differ.

    IIRC it was even able to curry general functions rather than just methods.

  10. Hi Herb,

    Where do you draw the line between “managed” and “native'” languages, especially with some newer languages like D and Go, both of which compile to native code, while providing GC and safety features, as well as higher level ones (like reflection) which C++ lacks. At the same time they allow you to get close to the metal if you really need to (though form my limited experience, D seems to allow that more than Go.)

  11. @MichaelTK

    This is where I disagree with Herb.

    If we look at the Algol/Pascal language family, the compile times are pretty close to what “managed languages” do, they offer the safe nice features as “managed languages” are known for, and yet they compile directly to native code while achieving performance similar to C++ compilers.

    Usually most of those compilers also offer settings to disable bounds checking, among other security checks, in order to improve performance in certain critic parts.

    Heck, even Bartok as a AOT compiler for .NET generates quite good code.

  12. Hi Herb,

    In your talks, I’ve heard you say the term “metadata” several times in the context of C++ vs managed language performance. Can you elaborate on it for me?

    Thanks.

  13. @pchethan: “Or, is this even in a scenario where managed langs are compiled directly to machine code?”

    Yes it is… As he pointed out and this is obvious anyway, JIT compilers do not optimize anywhere near a C++ compiler (just look at compilation times – even when you strip off the C++ specific parsing and stuff – then you get the idea). Additionally, and that was the point in which I am not convinced, Herb pointed out that even static compilers for NET would not be as fast because of language tradeoffs. I beg some language extensions here and there could cover for it, but maybe I am wrong ^^.

  14. Hi Herb,
    I have been following your talks for a while and you mention that C++ being roughly 2 times faster than the managed languages (while talking about energy savings).

    Where does this data come from? Is it roughly from the fact that for every machine instruction, there is an equal amount of CPU spent on JIT translation?

    Or, is this even in a scenario where managed langs are compiled directly to machine code?

    – Chethan

  15. Ah, looks like Arash made the point on energy consumption just before I did.

  16. One thing that I haven’t seen here, but is very important to note, is that different platforms require different tradeoffs. It would be difficult, for example, to see “sufficiently powerful JIT” ever beat statically compiled languages on some mobile platforms because late-compilation causes a significant hit to battery life.

    Here’s the thing, though. Almost every sufficiently large, sufficiently complex application is effectively a combination of both “managed environment” and “unmanaged environment”. There is almost always some low-level bit/pointer/structure manipulation which optimises some critical inner loop. And there is almost always some kind of high-level scripting capability for user customisation.

    Of course, sometimes the “unmanaged” layer is hidden behind a black box, such as a DBMS. And sometimes the “managed” layer is batch files/shell scripts. They are still a part of your system.

    So in practice, it’s not a case of either/or. The two approaches complement each other. It’s part of the job of an engineer to decide where the boundary is best drawn, taking into account everything from performance criteria to budget constraints. In a sense, the only objectively wrong answer is not to realise that there is a boundary to be drawn.

  17. On a side note, If one were to consider a simplification to the notion of computation based entirely on the amount of energy consumed when performing said computation, then it is obvious that JIT based solutions can never be better than proper native (pgo) generated code, furthermore it is more than likely in the overwhelming majority of scenarios that because the JIT approach inherently does more “stuff” it will inevitably consume more energy, hence it is by the previous definition of computation more inefficient, note that is still the case even if the excess consumption of energy is carried concurrently – not affecting overall computation time.

  18. @Ben Voigt: You are thinking too narrowly.

    Ada, Modula-3 and D are just a few of the languages with native implementations that have function pointers, and also support genericity with dynamic libraries.

    Many features of the so called managed languages, are also possible to have with pure AOT compilers, it just takes more effort to do so.

    The reason behind the JIT approach is mainly one of the convenience of easiness of implementation vs achievable performance.

  19. @Ben Voigt: Sorry but you are thinking just too conservative then… A JIT compiler in principle has not much to offer. Most of the nice stuff in NET comes from IL code and Metadata, and of course the huge runtime library and tool-support.

  20. Nothing except the definition of JIT compilation. If you are generating the machine code for a thunk, you are performing compilation at run-time.

  21. @Ben Voigt: There is nothing to stop non-JIT code from generating thunks in machine code.

  22. @Ben:

    GCC and ATL both support bound delegates. GCC calls them trampolines. ATL calls them thunks. C/C++ language designers should seriously consider adding them as a standard library feature since they’re so freaking useful. (They’re different from lambdas because they are of type function pointer, not of type class/struct.)

  23. It’s not an accident.

    Generics do not and cannot work the same without a JIT. Right now, you can load a library dynamically and create a generic collection of a type in that library. Take away the JIT, and you limit yourself to collections of types which existed when the collection was compiled.

    .NET generics and C++ templates are not equal.

    The JIT is also necessary to making bound delegates work with APIs that expect function pointers. There’s no place to store the target object handle except inside code generated at runtime.

  24. @Herb: BTW, concerning your hatred?! against “attributes”, my last sentence pretty much provides a strong justification for them. They are really great to play around with language extension (the ones that are not too invasive) without messing up parsers/tools. You can easily extend compilers like Mono or Clang to understand new attributes and try out your ideas right away. This is an unvaluable tool. I agree there are other similar ways like deriving from placeholder objects and stuff like that but they are rather limited and annoying to use. But I agree, these attribute extensions should always be backwards compatible which means the program should run on a compiler not supporting them…

  25. @Herb: First, thank you for explicitly answering like that. Obviously this topic draws a lot attention, so I don’t seem to be the only one concerned with it ;). I will sure wait for your talk to be online. Given your statements, I feel rather tempted to validate them in experiments ^^. But that is not that easy. Might be a longer research effort. There are a lot of concrete ideas I have for improving NET performance. Unfortunately it is hard to get them noticed without proof. So I guess I’ll just have to go the route of implementing them in the Mono compiler to either show that my assumptions about C# being able to run on C++ pace are wrong or you were too pessimistic in terms of NET compilers, or I too optimistic. I agree though, that such a performance leap is rather impossible without C# language extensions. But so far I am confident that we could get away pretty cheap at this, maybe entirely through Attributes in the first place (to play around without messing with parsers/tools)…

  26. I also agree with Doug’s post.

    It is just due to an accident that today’s managed environments, as Microsoft calls them, make use of a JIT.

    There are quite a few languages with GC and other programmer productivity features, that compile to plain native code. I quite convinced that most of these languages are able to have implementations with a performance comparable to current C and C++ compilers.

    Lets not forget that C and C++ compiler have years of money invested into them, making them able to squeeze every performance out of current systems.

    To pick on Microsoft’s own technology, Bartok’s compiler achieves quite a good code quality when compared with C++. Maybe it should be about time that it is made part of the .NET SDK?

    http://www.eng.auburn.edu/~agrawvd/COURSE/READING/ARCH/The_singularity_system.pdf

  27. When Herb says that managed apps use virtual functions by default it’s rather obvious that he doesn’t mean that in C# all methods are virtual by default – if you think so, you just didn’t get the point. What he wants to say is that managed languages create pressure to solve many problems using virtual dispatch, while the same problems would usually be solved in other (more efficient) ways in C++.

    Moreover, although methods in C# are indeed not virtual by default, each type is virtual-enabled by default and with no way to opt-out. Every single object on the heap will have a vtable and several elementary methods inherited from System.Object that are virtual, so I can’t see how C# is not build around virtual functions.

  28. I agree with Doug’s post. We handed over a lot of control to the compiler or the OS over time and most of us did it without looking back nor regret.
    And my guess is, that some programming languages will become even more abstract as to allow for optimizing parallel processing. At some point it will probably be more important to know, if a method is side-effect free than if it’s virtual or static.
    Unless you embrace functional programming, especially loops aren’t abstract enough to let the JIT do all the optimizing. Java’s enhanced for-loop and the upcoming closures are but two steps in this direction.
    Optimizing parallel processing could turn out to be so complex, that programs, which can take advantage of this, are always faster when written in Java or C# as compared to C++.

  29. “In garbage collected languages this is a lot harder to pull off.”

    It’s not. It’s called: don’t allocate memory. If you don’t allocate memory, you won’t run the garbage collector.

    This is exactly how LMAX works on top of the JVM. They preallocate all resources and store them inside of lock-free ring buffers as part of a framework they call Disruptor.

    Worker threads can check-in and check-out work units from these ring buffers in a completely lock-free manner without ever having to allocate any memory. Data is read off the wire into preallocated buffers, which are then used to process stock trades.

    By never running the garbage collector, they are able to satisfy the latency requirements of a stock exchange, even while running on top of the JVM.

  30. I think an even bigger factor is memory management. Allocating & deallocating objects on the heap should be avoided in any performance critical code. They are ***FAR*** more expensive than most people think, especially on multi-core systems. C++ it’s not that hard to create an efficient pool of objects of a given type (where the actual allocation is one big array that you grab elements out of), and have short string members of the object inlined as part of it instead of separate allocations. In garbage collected languages this is a lot harder to pull off. And while the per-allocation overhead of the garbage collector might be less than most malloc & free implementations it is still to be avoided if you are at all interested in performance.

  31. I think that to a certain extent, we all agree on 99% of what is being said and are only disagreeing on semantics. I take it as given that, with enough effort, you can always write a C++ program that beats a .NET program (or any other “managed” system), because if worst comes to worst, you can write a C++ runtime that does whatever the managed runtime does but does it in a way more optimized for the specific task at hand. And I agree that, as Herb says, there are constant tradeoffs between developer productivity and runtime efficiency and that the C++ vs. .NET split manifests some of those tradeoffs.

    There are some disagreements about what Herb said regarding virtual dispatch. In particular, it is worth noting that C# classes do not use virtual dispatch by default, contrary to what is done by Java and contrary to what one might think after reading Herb’s initial post. On the other hand, the .NET framework does make use of virtual dispatch for many cases, WriteLine being one of them (it uses virtual dispatch to convert each formatted object into a string, probably another to convert the string into bytes based on a runtime-selectable encoding, and then uses a third virtual dispatch to send the bytes to the underlying I/O device). It should be noted that C++ also uses virtual or virtual-like dispatch in similar places for its I/O (printf uses switch statements based on type and usually uses a function pointer to connect to the underlying I/O; iostream uses templates to convert the string to char but still uses virtual methods to send I/O to the output device). So both .NET and C++ use virtual dispatch where appropriate, though perhaps it would be fair to say that the .NET framework uses virtual dispatch more often. I think we all agree that virtual dispatch can substantially affect optimization and that in some cases, it can make a big difference in performance. I don’t think WriteLine is a particularly good example (I have never found the virtual dispatch in WriteLine to be the root of any performance issue), but virtual dispatch should definitely be avoided in any performance-critical inner loop. That said, I don’t think virtual dispatch is really a core issue since the problem exists and is solvable for both C++ and C#.

    Herb also mentioned several other costs associated with managed code — a larger runtime, additional metadata, and garbage collection. I think here we probably also agree in substance of fact, though we might disagree regarding the conclusions to draw from the facts. One thing that is clear to me is that if you aren’t taking advantage of the benefits provided by a technology, it is pointless to pay the costs of it. On the other hand, if you are getting a benefit out of a technology, it might be worth a price. C++ offers a wide range of prices and corresponding benefits (from freestanding with no EH or RTTI all the way up to full STL and beyond with some very powerful libraries available). Microsoft’s current managed offerings are somewhat less flexible, but also offer a range with varying costs and benefits (Micro framework, Compact framework, Silverlight, Client framework, Full framework). Other managed frameworks such as Mono offer even more flexibility, compiling a managed project down to a standalone executable. It might be accurate to say that there is a function “price paid versus benefit obtained” for each runtime environment, and that the range and domain of the function will vary for different environments. So certain situations, it would be stupid to use .NET, and for other situations, it would be silly to use C++. If .NET causes performance problems for your app and C++ does not, and if your app isn’t using any .NET features, there isn’t much point in using .NET. If you need features available in .NET that aren’t in C++ and the .NET performance characteristics meet your needs, you should definitely be using .NET.

    Here’s where I think the big division occurs: what will the future hold? My personal opinion is that the managed runtime will follow a development path similar to many other technologies designed to improve developer productivity — it will be resisted as too slow and expensive, become somewhat more efficient, become mainstream, and eventually get to the point where it beats the performance of the older technology in some cases but not in others (mainly in “bigger” systems, though machines meeting the definition of “bigger” will be more common as time goes on).

    Quick trip down memory lane. Remember the Von Neumann architecture of programmable computers and how inefficient that was compared with the special-purpose hardwired machines, for the sole purpose of trading off efficiency versus developer productivity? And those nasty compilers that generated such horribly inefficient machine code just to make developers happy? And that kernel/user split that takes power out of the hands of the developer and tells the developer that the program must get permission from the OS before doing certain things? What about the substantial performance penalty (not to mention unpredictable pauses) that all user-mode programs have to pay in order to get access to virtual memory? Aren’t all of these just tradeoffs that give up programmer control and performance efficiency in order to improve programmer productivity?

    Each of these technologies adds a level of abstraction with non-zero cost, but they also open up possibilities that were previously infeasible. When introduced, the cost of each technology was prohibitive, but hardware and software refinements have made the costs acceptable for most systems (though still unacceptable on small systems). Eventually each technology always gets to the point where it becomes quite challenging to beat the new technology with the old technology (at least for “bigger” systems), though with enough effort you can always make something using the older technology that is more efficient than what can be achieved using the newer technology. For some applications, it’s still easier to use the older technology (light dimmer switches don’t usually have embedded Pentium processors — they’re usually just hardwired potentiometers or rheostats), but as time goes by, it requires more and more work to match the productivity and even the performance of the new method using the old method (nowadays thermostats often do have programmable CPUs). For example, a good assembly-language programmer can always out-optimize the compiler, but beating the compiler gets harder and less worthwhile every year.

    Of course, for simple or limited hardware, the old technology will still win in both performance and productivity, but each year you have to go lower and lower on the totem pole before you reach that point.

    I still do more C++ development than C# development, but the reasons almost always have to do with dependency management or because I’m working with an existing C++ project. The reasons essentially never have to to with performance or efficiency. In nearly all cases where the .NET framework is an option, it meets my performance needs. The primary exceptions are for programs that do very little work per process, in which the runtime initialization and runtime shutdown dominate the total runtime.

    Herb is definitely correct that the Microsoft .NET framework is not currently aiming for high levels of optimization. The current level of optimization is good enough for most use cases of the target audience. More optimization would require more time spent JITting, and most customers are voting for faster JIT over faster execution. However, if more customers ask for faster runtime in the future, it would be possible possible to reduce load times with NGEN ahead-of-time compilation and then to use use profile-guided re-JIT at runtime. Using techniques proven in JavaScript engines, the hot paths can be determined at runtime, heavily optimized, and then the re-optimized versions can replace the original quick-JIT or NGEN versions. There is no reason why such a dynamic JIT could not beat the performance of any native code (assuming the host machine is “big” enough to support the runtime).

    The trend in programming is to hand over more and more of the management of the runtime environment to the system each year. For example, we no longer manage overlays and instead depend on the OS to manage virtual memory for us. By providing metadata, we pay a price in terms of disk space but we get the benefit that the runtime can manage a lot more aspects of the program’s execution. I currently trust that the OS will do a much better job managing virtual memory than I could if I were to do it by hand. I don’t currently trust that the .NET runtime will do much better than I could do by hand, but it does a pretty good job now and will only get better with time.

    One additional note is that it isn’t just programmer productivity that is purchased with the use of a managed runtime. In addition, you get many capabilities that are hard or impossible to offer with current C++ systems. For example, you get a working module system. The best module system available for C++ right now (to my knowledge) is COM, and that requires the use of a non-C++ tool (MIDL) to make up for the language and runtime shortcomings. WinRT looks very promising, but it isn’t out yet and isn’t standardized in any meaningful way. On the other hand, .NET has type-safe cross-module (and cross-process) communication built-in and standardized, so it is much easier to create a multi-module composable system. As another example, C++ code cannot be analyzed by the system sufficiently to make a solid decision about whether or not it would be safe to run the code, but .NET’s verifier provides a very powerful way to know exactly what a piece of code can and cannot do before running it.

    In the end, it comes down to whether the price you pay for your runtime environment is worth it. The freestanding C runtime environment is very cheap. The minimal C++ runtime environment is quite a bit more expensive (includes memory allocation, exception handling, and RTTI). The complete hosted C or C++ runtime environment is more expensive still. But the price is amortized across multiple instances and is worthwhile if you use the features. The same can be said about any runtime.

    Anyway, sorry for blathering on for so long. Cheers!

  32. “The root cause is much more fundamental: Managed languages made deliberate design tradeoffs to optimize for programmer productivity even when that was fundamentally in tension with, and at the expense of, performance efficiency.”

    This is completely untrue. As evidence to the contrary, I ask you to take a look at LMAX:

    http://www.infoq.com/presentations/LMAX
    http://www.infoq.com/presentations/LMAX-Disruptor-100K-TPS-at-Less-than-1ms-Latency

    LMAX is the world’s only stock exchange with a public API where you can trade directly without going through a broker, and they didn’t write it in C++, they wrote it in Java.

    Please look at how they’re reasoning about the hardware. Their talk could serve as a great introduction to modern hardware and memory/cache architecture in and of itself. These aren’t people who are blinded by abstractions. These are people who understand and can reason about those abstractions in the context of how modern CPUs and memory architectures operate.

  33. I see a lot of commenters defending JIT compilers.

    Do you guys even understand what ‘optimization’ is? If you have a fixed set of instructions that will always have to run, there is no room for optimizations. In these cases C++ will perform faster than any managed language. And if you don’t write overly ambiguous code, the C++ compiler can perfectly optimize away unnecessary instructions where possible.

    About what the JIT compilers could theoretically do… guess what my fantasy C++ compiler is better than your fantasy JIT compilers will ever be.

  34. @Miguel: Make it `Trace.WriteLine()`, then. That could go to a console, debugger, file, network socket, etc. It’s extremely hard to say that performance doesn’t matter in such cases. That the I/O target is the bottleneck is an oft-stated but rarely validation assumption.

  35. I am not sure why Console.WriteLine is the focus of a discussion on virtuals and performance.

    Console.WriteLine is about I/O which is about the slowest code path in a program, so any performance slowdowns on Console.WriteLine is minimal in comparison with the actual kernel context switch and actual delivery to the output mechanism (worst case: a windowing system showing a console output, best case: output redirected to /dev/null).

  36. Why do people persist in using general purpose programming languages? Maybe what is needed is a collection of special purpose languages designed to seamlessly integrate. Kind of like the .NET environment but with the c++ model rather than the managed model as the global assumption.

    I’ve heard it claimed that 90% of code is input, output and format translation. Why isn’t there a language tuned for exactly that case? Another claim has been made that 90% of code is irrelevant to performance, so why isn’t a lot of that code written in a productivity oriented language? Large scale optimization is hard so why isn’t performance critical code written in a language tuned to support it at the expense of some productivity? Similar statements can be made about threading and a number of other domains.

    If such languages existed such that inter-language calls were just as easy and efficient as intra-language, why not go that way?

  37. So, if we have a JIT VM for C++, wouldn’t the JIT’ed program be faster than the natively compiled program? (assuming the JIT VM can do some optimizations the compiler cannot do).

  38. @Ivan: For one thing, some high-performance lock-free algorithms are not possible unless you solve ABA, so having GC will strictly enable things we can’t otherwise do in portable code using C++11 today.

    @Achilleas: You can have a JIT VM for C++ — for example, VC++ /clr basically does that by compiling Standard C++ to .NET IL as a target — just like you can compile C# and Java to native code — NGEN and newer efforts do this though you still need some runtime but can avoid JIT. However, doing either doesn’t change that the languages were designed with the opposite assumptions, it doesn’t change their semantics, and it doesn’t turn one into the other. For example, when you compile Standard C++ code on VC++ using /clr we’ll emit .NET IL for the code, but the data types stay native (e.g., no metadata, no JIT layout) and the program uses only the native heap (because C++ objects cannot in general tolerate being moved in memory without their knowledge, it is impossible in general to allocate C++ objects directly on a compacting GC heap without pinning the world which means it is no longer a GC heap, or even a functioning heap for that matter in many cases).

  39. I sort of agree with Herb. People have long been arguing that a “Sufficiently Smart JIT Compiler” could use runtime information to beat a static C++ compiler. However, languages get more abstract as compilers get smarter, so it’s a wash. I would not, as Herb does, make the strong statement that “managed” languages can’t be faster than C++. There used to be tons of research in the early 90s on very low-level portable instruction sets (lower than LLVM) that are JIT compiled into native code. It’s basically a static C++ compiler that has one last chance to use a JIT at runtime to beat a static compiler.Similar to, but better than, Apple’s Rosetta: http://en.wikipedia.org/wiki/Rosetta_(software)

  40. Hi Herb,

    I’ve been trying to work out how to make my favourite dynamic language compiled. Imagining writing compilers is a very good thought exercise :)

    Now here you’re talking about JIT in a static language context but I think that there’s some similiarities – that Java even chooses to tag and box (while sometimes .NET doesn’t).

    The bit of my dyanmically-typed code that I can’t work out how to ‘optimise’ to be as fast as, say, C or C++ is dyanmic types. In JITing with specialisation you can smooth over a large amount of it at local scope, but you’re stuck with tagged types for everything shared and they can’t easily be in sequential memory either.

    Can you please assure me that my own investigation is flawed? http://williamedwardscoder.tumblr.com/post/19538827844/why-dynamic-programming-languages-are-slow

    I would love to be wrong and have missed something; it’d be so cool if you could point me to a way around this problem!

  41. @ Herb ” GC can help some high-performance data structures and algorithms: ”
    Can you explain this in a bit more detail. I know of the the for eg. ABA problem but I never got how having another GC thread in lock free alg is helpful.

  42. @Doug, re this specifically:

    It is conceivable that a future version of the C++ compiler front-end could emit pure .NET IL,

    Actually, I think you know (but not all readers might) that past and current versions do the above part. Visual C++ 2005 and higher do that under the /clr switch — pure IL for instructions, all native data (e.g., all native types and heap, no .NET types or GC heap).

    and the .NET ahead-of-time compiler would convert that to a native EXE or DLL indistinguishable from the result generated from a similar C# program (as long as the C# program doesn’t use any GC types). The .NET platform is rich enough to allow for that. While this isn’t the direction the platform is currently taking, I would be very hesitant to claim that .NET intrinsically cannot reach those goals.

    I used to be hesitant, but am not any more: It won’t, because it’s designed to reach legitimately different goals that are in tension with full-on performance efficiency, and invoking Turing-completeness to say X could do Y isn’t really helpful or feasible in the end — we should use the tool optimized for X *and* the tool optimized for Y.

    It’s important not to fall into the “one size fits all” == “one language can theoretically optimize for everything” trap, tempting though it is, because the trap is based on (and baited with) the faulty assumption that there are no tradeoffs. There are tradeoffs, and good languages are carefully designed to choose among them. The right way to use each of Java or .NET or C++ is to evaluate what it really is designed to excel at and use it for that purpose.

    Remember, one of the first questions to ask any seller of any new product is: “So, tell me, what *isn’t* this good at?”, or “What *shouldn’t* this be used for?” If the seller can answer that kind of question clearly, then you know two important things: (a) they understand their product; and (b) they’re not trying to pull the wool over your eyes. (If they can’t or won’t, run the other way.)

    Let me give a partial starter answer for C++: C++ is not, and never will be, good at always-on runtime dynamic reflection for all types and methods, as long as that would incur metadata overheads on programs that don’t use it (which as far as we know it does today). Although I could imagine C++ someday allowing opt-in metadata — generating metadata under a compiler switch or allowing types to optionally add the overhead of supplying metadata, so that it’s pay-for-play.

  43. For GC to beat C++ memory deallocation, it could either be done at a later time or by another thread. Doing it later increases memory usage so it’s probably not a good idea beyond the scope of simple benchmarking. As multithreading finally gets easy, moving deallocation to a separate thread loses efficiency, since most cores are likely to be used anyway.

    Besides, C++ has plenty of tricks up it’s sleeve. For small temporary storage, use a stack based allocator that reduces memory allocation/deallocation cost to almost zero. I’ve used the one written by Howard Hinnant:

    http://home.roadrunner.com/~hinnant/stack_alloc.h

  44. @All: In addition to the previous comment, let me contrast C++’s versions of “Console.WriteLine” and “generics” — unlike .NET Console.WriteLine which relies on virtual dispatch, C++ stream I/O is usually about direct function calls; and unlike .NET generics which are all about virtual dispatch, C++ templates are not only always about direct function calls, but inlined-by-default direct calls.

    Yes, absolutely there’s great heroic effort and effect in the managed language world around things like devirtualization and always-on PGO. But which is faster, and more reliably and often faster: (a) an advanced JIT with 20 years of research and field experience that with lots of runs can often guess well about which calls to safely inline; or (b) aggressive inlining by default all the time from the first build, on top of option (a)? (And note that (b) is inherently a strict superset of the optimizations in (a)…)

    There really is a difference here, and managed languages legitimately prioritize something else (productivity) and then try to make the performance back via optimization. That’s great when you want to optimize programmer productivity first at the expense of mandatory overheads, which you should do when programmer time is your biggest cost and you can afford the extras (e.g., 199x/200x in-house LOB client apps as just one example category where those are often both true and so tends to be well served this way).

    But there’s always an inescapable and fundamental difference between “prevention” and “cure” — when it comes to performance optimization, C++ always chooses “prevention,” and managed languages choose “cure” with the above-mentioned heroic efforts and many more. But the old ounce/pound saying is inescapable; you can’t beat prevention (in part because you can always add the cure after first doing the prevention, but not the reverse), and if you care about performance and control primarily then you should use a language that is designed to prioritize that up front, that’s all.

    Like many of you, I’ve lived the prevention/cure choice many times in many contexts. For example, if database consistency is paramount, then it’s much cheaper and simpler and easier to do even *lots* of extra work up front to keep databases consistent all the time where possible, than to allow inconsistencies to happen and then later try to fix them up with compensating writes or merges — you end up giving up overall performance and/or never making it back to full consistency. (Note that failing to have full consistency can be a perfectly valid choice in a world-class system; many distributed database systems are built on such imperfect-consistency models.) But giving up consistency, which after all is the C in ACID, takes some getting used to, and you have to be very careful to know exactly what you’re getting and what you’re paying if you opt to go down that road. Once you let those databases get out of sync, you *will* pay more in aggregate to get them back in sync and/or won’t get them in perfectly consistent sync again. Be sure it’s worth it and you can afford the compromise, that’s all. It often is, but as in any commercial transaction you have to know both what you’re paying and what you’re getting.

    The C++/managed choices around prevention vs. cure are not fundamentally different.

  45. @Doug: Thanks for the informed points, and some short answers:

    Re virtual: It’s true that C# and Java have differences, but C# is also based on assuming virtual functions as a basis for many things — from the oldest Console.WriteLine accepting big-Oh Object and virtual ToString formatting, to the more recent .NET generics which are all about virtual dispatch.

    Re GC can help some high-performance data structures and algorithms: Absolutely, and I’m personally proposing a GC facility for C++ — not as a replacement to any use of traditional C++ memory management (smart pointers and ref counting), but rather in addition to it via a special GC allocator that you use explicitly exactly in those rare cases when you want it for things like solving ABA problems for high-performance lock-free coding (which is one of the key examples in the category of “some algorithms that are faster with GC”).

    Re managed pegs and native holes: Managed languages were designed under the assumption that VMs, metadata, and GC are “always-on or default-on,” and those are major examples of how managed languages are deliberately designed for optimizing for programmer productivity at the expense of incurring overheads on code that doesn’t need the features they enable. Yes, you can fight your platform and language — but you’re working against it, not with it. I do think the right answer is not to try to make C# do what C++ does, or C++ do what C# does, any more than it would make sense to try to make a hammer do what a screwdriver does or vice versa — each is a finely crafted and optimized tool for its job. It’s just about understanding what the tool is designed for, and working with it as we select and use the right tool for the right job.

    Please do watch the Lang.NEXT talk and then let me know what you think!

  46. As the commenters before me, I would like to say that you seem to be misinformed in what regards C#, as it also provides non-virtual member functions by default like C++ does.

    Personally I don’t like the “managed languages” term created by Microsoft, as this is only an implementation detail. Does C# become native when I get a compiler that only generates native code, like Bartok does? Or does C++ suddenly turn into a managed language when I use C++/CLI to compile C++ code?

    A proper C# compiler with compilation flags to turn off range checking when desired, escape analysis, and optimizations across module boundaries, can go a long way to provide a performance level that is quite close to what most C++ compilers usually provide.

    Heck, Mono’s AOT compiler is a good example here, specially with the SIMD integration that .NET lacks.

    Modula-3 and Ada are two languages with features that one could consider to be “managed”, but also compile to native code with performance very close to C++. Sadly both never really caught on, besides niche markets.

    Anyway I am looking forward to watch your talk in Channel 9, as they are always quite interesting.

  47. Have to agree with barrkel.

    JITs have the fundamental advantage that profile-guided optimization is always turned on. There’s no limit to the degree of optimization, but the JIT has to be careful that expensive optimizations are applied with sufficient selectivity.

    Productivity features aren’t at odds with performance. C++ teaches us that type safety positively affects both. It might be easier to add features to a language with lazier interpretation and looser types, but in the end everything can be optimized to the same machine code once it’s determined what the program really does. Even JavaScript is getting a lot faster these days!

    The reason JITs won’t beat native code is that no JIT bytecode is ultimately better than what can be derived from native code. When JIT technology advances to the point that it can reliably beat native code, native-to-native optimization will simply be widely deployed. The process is already underway. Transmeta did fail… but that was long ago.

  48. Note that your information on virtualization is based on the design of Java and does not apply to .NET. In particular, C# methods are non-virtual by default, while Java methods are virtual by default. You might be confusing this with C#’s use of the callvirt instruction for non-virtual method calls. The use of callvirt doesn’t affect the method resolution. Instead, it simply tells the JIT to validate that the object is non-null before calling a method on that object. This means C# has an extra language-mandated runtime check over C++ (test ecx, [ecx]) for each method call, but otherwise C++ and C# are on equal footing here.

    I also want to note that some of the issues you mention are not intrinsic to the .NET platform itself, but are instead due to the way it is currently implemented.

    It would be possible to implement a .NET runtime with a fully-optimizing JIT that can meet or beat the efficiency of a C++ compiler (beat due to cross-DLL inlining). I agree that it won’t happen for the common case anytime soon since most of the users of .NET care more about app load time than execution efficiency, but it might happen with a fork or special-purpose variant of the runtime, or with a next-generation NGEN. (Of course, C++ could also take advantage of this. Which would be pretty cool, given that AMD and Intel can’t get their story straight regarding FMA3 and FMA4, and SSExyz.)

    It is possible to use .NET (and C#) without the garbage collector and its associated write barriers, either by avoiding dynamic allocations or by using unsafe code for dynamic allocation the same way that C++ does. Only C++/CLI currently makes this even close to convenient, but there is nothing intrinsic in the platform or runtime that makes it impossible to eliminate the overhead. In other words, .NET’s garbage collector could also be made pay-for-play.

    Finally, there are some algorithms for which GC is faster than equivalent manual malloc/free. The overhead of malloc/free and ref-count maintenance is non-zero. So is the overhead of GC. The interesting part is that the variables that go into the calculation of the overhead are very different for each technique, and comparing them is apples vs. oranges. Carefully-managed malloc/free (or unique_ptr/shared_ptr) memory management may or may not beat carefully-managed garbage-collected memory management — it depends greatly on the task at hand. The nice thing about .NET is that you have the option of going either way as needed (though the present languages strongly encourage use of GC memory and the current platform always loads the GC even if your program never uses any GC types).

    I will agree that in the current implementation, the .NET platform is heavily optimized towards developer productivity and runtime verifiability, and all of the .NET languages make available several features that have high runtime costs. C# provides support for GC types but does not provide efficient support for RAII types, and C++ is the opposite. The developer is encouraged to take advantage of .NET platform features anywhere it makes his/her life easier. However, it is also possible to write .NET code that uses the runtime efficiently such that it takes quite a bit of effort to make a C++ program that runs as quickly on modern hardware. Can you beat the .NET version? It’s always possible (since the .NET version eventually boils down to something you could do with C++), but it’s a question of how much effort you have to put out.

    It is conceivable that a future version of the C++ compiler front-end could emit pure .NET IL, and the .NET ahead-of-time compiler would convert that to a native EXE or DLL indistinguishable from the result generated from a similar C# program (as long as the C# program doesn’t use any GC types). The .NET platform is rich enough to allow for that. While this isn’t the direction the platform is currently taking, I would be very hesitant to claim that .NET intrinsically cannot reach those goals.

  49. Cliff Click said in a lecture that JIT compilers for Java are basically doing runtime profiling, he compared it to O2 optimizations in GCC.
    Regarding NGEN: does it do RAII like “GC” for objects that it can prove have scoped lifetime, or is it just “simple” C# static compiler? And more importantly if answer is no: would “RAIIing” in NGEN provide performance gains ?

  50. JITs get their performance advantages from profiling the final running code, not from NGEN approaches. The poster child is inlining in situations C++ and similar languages cannot statically determine are safe places, with inline caching. NGEN approaches are the route to reducing startup time, not maximizing performance (because pre-compiled code, even if as late as installation time, is not maximal).

    That’s a distinct issue from the productivity advantages of managed languages. C# does not have virtual functions as a default, BTW; but dynamic languages do, because of the ease with which their approaches let you leverage code to write code (i.e. metaprogramming). They explicitly trade performance for productivity, and hope to get some, but not all, of that performance back with optimizations that come from profiling. But that doesn’t stop you from using a non-virtual by default language like C# with a runtime that’s ultimately able to use dynamic optimization. GC too is a red herring; memory safety greatly increases the certainty with which the compiler can understand semantic intent, but there is no requirement that performance critical code use GC.

    IOW, you’re not being very convincing here, and sound misinformed.

Comments are closed.