Recommended reading: Why mobile web apps are slow (Drew Crawford)

I don’t often link to other articles, but this one is worth reading.

Why mobile web apps are slow

by Drew Crawford

… So if you are trying to figure out exactly what brand of crazy all your native developer friends are on for continuing to write the evil native applications on the cusp of the open web revolution, or whatever, then bookmark this page, make yourself a cup of coffee, clear an afternoon, find a comfy chair, and then we’ll both be ready.

He offers data (imagine!) to justly debunk many common memes and “easy answers” that routinely litter HN/Reddit/Slashdot comment threads. The piece is also often subtly (and intentionally) hilarious – do watch for the subtle humor, not just the obvious wit.

Don’t be distracted by the author’s viewpoint and emphasis on “iOS and Javascript” development – the article covers lots of important ground, including:

  • developing for ARM vs. x86;
  • developing for desktop vs. mobile;
  • managed vs. native code performance;
  • JIT issues vs. inherent language design tensions;
  • why garbage collection is not at all the panacea it’s often billed to be and often needs to be emphatically avoided (did you realize Apple already jettisoned GC?); and
  • as many of you know already, why if you’re serious about performance you’ll be seriously serious about memory usage and access patterns as a first-order issue.

I agree with most of it, and not just because he quotes from my When Will Better JITs Save Managed Code blog post from last year.

Recommended.

Spoilers

A few takeaways from the conclusion (spoiler alert):

Garbage collection is exponentially bad in a memory-constrained environment. It is way, way worse than it is in desktop-class or server-class environments.

Every competent mobile developer, whether they use a GCed environment or not, spends a great deal of time thinking about the memory performance of the target device

JavaScript, as it currently exists, is fundamentally opposed to even allowing developers to think about the memory performance of the target device

If they did change their minds and allowed developers to think about memory, experience suggests this is a technically hard problem.

asm.js show some promise, but even if they win you will be using C/C++ or similar “backwards” language as a frontend, rather than something dynamic like JavaScript

36 thoughts on “Recommended reading: Why mobile web apps are slow (Drew Crawford)

  1. And yet, MSFT tries to move developers away from C++ (not Herb or the VC++ but everybody else):
    – I saw build 2013 videos where MSFT People said that C# delivers better Performance that native code.
    – Even worse is the Windows Phone Team that argues that .NET Overhead is just around 5 MB.
    – Most Metro apps from MSFT (Mail, Photos, Music/Videos, Weather, Store, etc.) are all written in HTML/Javascript. The Performance on ARM (Surface RT) is horrible. Apparently, they just don’t care.
    – Most Enterprise products don’t Support C++. Anything from Webservices to Azure, if you are C++ developer, you better look somewhere else.
    – MSFTs own People (higher positions) promote Xamarin products. Basically a re-implementation of .NET that they SELL. They rather see you making apps in C# and Xamarin then in C++. Just sad.

    Luckily, there seems to be a glimpse of hope. The new XAML platform is available from C++ (still no UI platform or desktop developers) and we are getting more C++11 Features (the slowest from the Major C++ Compiler).

  2. Thanks for sharing, very nice and interesting article.

    I know it is not really your area, but it would be nice that Microsoft eventually has a real optimizing native compiler for C#, not ngen.

    With the same kind of knobs that languages like C++, Ada and Delphi offer for code generation.

  3. @Herb, thanks for the interesting link.

    That said, I was wondering if you could give us all an update on the status of the addition of an AST to the MS C++ compiler. Is there a time-line for completion? will it be available for use in vs2013 RTM. I think we’re all VERY interested to hear your take on this very important issue.

  4. Thanks for sharing. Very good read indeed.

    @Jasper There is a good alternative to XAML. Way more powerful than XAML. And it’s QML.

    Naturally, as a C++ developer I’ve always avoided using it. Until I used Qt Quick 2.1 in my recent project.
    And, I admit I fell for it.

  5. A very good article. Thank you very much for sharing.

    After reading the whole of it, I gained much appreciation for modern C++’s flexibility, TBH. Maybe it’s now a good time to say that GCs need to collect themselves! :-)

  6. Long but interesting article, but absolutely no surprises, just confirmation of the naysayers critism of .net and Java languages when they were first introduced. Great ideas let down by dogma, and as for Javascript that’s one compromised sick dog of a language that is well past its sell buy date, but still selling well and causing tummy bugs to anyone who consumes more than a few lines :)

    Main worry is that c++ is headed the same way as JavaScript with so many additions that the environment becomes polluted, the KISS principle is so often forgotten by standard organisations!

  7. @Concerned: The AST work is just an internal compiler implementation detail that I thought I’d mention for general interest, not as a product or API in itself or anything; just a “one of the things we need to do” items. It enables the implementation of new C++11/14 language features, and I gave a timeline for completion of those in the roadmap we shared at Build.

  8. It’s an interesting article to be sure. That chart is really striking in showing what the effect of not using the best possible Garbage Collector implementation can be. The difference between Non-compacting Mark-Sweep which I guess was and still is comparable to the Boehm-GC and Generational Mark-Sweep like in Java in 2005 doesn’t even fit into the same chart!

    Quantifying the Performance of Garbage Collection vs. Explicit Memory Management, the paper from which that chart is from also has other interesting charts. Now if I’m reading the benchmark results from Figure 7a-f right, as the available memory increases between 50->70MB, 70->110MB, 60->90MB, 40->60MB, 45->50MB and 40->50MB for the six different benchmarks, Generational Mark-Sweep GC from 2005 goes on to perform just as well as clairvoyant malloc&free.

    To me in 2013 sitting at a computer with 11936MB of memory just caching files that are currently not used, the tradeoff seems like a no-brainer as long as you can afford it.

  9. @Timo: It’s not just about the performance of GC (requiring 5x space cost to be comparable in perf). Even beyond that, it’s also about determinism (which the article alludes to briefly) and control over where things are allocated (e.g., if you’re serious about performance, you’ll often use arrays and GC’d object graphs are antithetical to performance in many cases — you can still get arrays but you’re fighting against the system to get performant data layouts; and control over hold/cold data separation and cache line layouts both of which are hard in managed environments that often don’t give you the control you need to specify things like alignment directly).

  10. @Timo the trouble is the ‘no brainier’ and ‘if you can afford it’ is the problem. Not every platform is going to have the luxury of copious amounts of memory! It’s not that long ago smartphones had just 512Mbtes of RAM and even that is generous compared to small web connecting devices even today.

  11. @Herb: I am no expert in compiler design. Just someone that got really interested into the field during the university and still follows with passion this subject. Although, like many, I tend to write standard line of business applications in C++/JVM and .NET languages.

    The article you pointed you is very interesting read as I mentioned in a previous comment.

    What I think is usually missing from these discussions, is that GC and JIT tend to be mixed up with VM based environments when discussing managed versus native.

    Although, there are quite a few languages with GC, sadly not mainstream, that compile to native code.

    What would actually interest me is to know how well such languages would fare, if their native compilers had an optimizer with the same amount of investment as C and C++ compilers enjoy since the early 80’s.

  12. @pjmlp Suspect the actual GC is probable (hopefully!) well optimised, its the data (memory) that it has to process that might be less so. Given the amount of time the JIT has to optimise compared to a native compiler you may well have an interesting question!

    Be surprised if the c++ standards people didn’t consider the viability of GC given its attractions to developers. The current c++ smart pointers are not a million miles away from it, or a least from a sort of declare and forget point of view! This might hint that GC is just too expensive?

  13. @M.S. Babaei:

    I did some contributions to Go before giving up on his simplicity. I was thinking more in the Oberon language family, Modula-3, D, Haskell, OCaml, Lisp.

  14. @Brian M:

    C++11 defines an API for GC integration, it is up to the compiler vendors to offer a GC if they feel like to.

    C++ can only have a conservative GC due to the way the language allows for pointer tricks and by being unsafe by default.

    The current smart pointers suffer from performance problems, because every change of ownership carries with it the weight of counter changes. As well as some complexity as shown by Herb post about how to pass pointers around when doing functions and member functions calls.

    It would be nice if the standard also defined that the compilers should be smart pointer aware and were able to elide smart pointer operations if at the exit of a basic block the counter is to have the same value as at the entry. This is done in Objective-C ARC and ParaSail, for example.

    But yeah, what I really miss are comparison with optimizing compilers for other languages, and the main issue is that they don’t exist.

    VM based languages took over the enterprise the last decade and developers not versed in compiler design tend to mix those language features with being managed, while being unaware that quite a few languages with similar set of features, but with native compilers, failed to make an impression in the industry.

    So the existing compilers for those languages are mostly good enough compilers compared with the 30 years of compiler research that has been put into commercial C and C++ compilers.

    For example, how performant would Modula-3 or Active Oberon be if they haven’t died in academia and had an optimizer with 30 years of research built into the compiler, even with GC. Or Microsoft’s own Sing# if Singularity didn’t died as project?

  15. @Herb: I don’t feel that “fighting the design” is a totally unbiased characterization. If you’re trying to get low-latency framerates, is the Desktop Window Manager actively working against you so you’re forced to fight the design of Windows? And then when Windows 8.1 gets GetFrameLatencyWaitableObject and SetMaximumFrameLatency, is it because the design of Windows was changed from anti-low-latency to pro-low-latency? No, a more likely explanation is that enough people made enough noise to get some other features bumped for these controls that let you work better with and within this managed windowing environment.

    Likewise garbage collectors provide settings to tune. Likewise if one started providing SetStopTheWorldGCDeferment that would be an incremental improvement rather than a design U-turn. But suppose you don’t want any GC pauses at all and reject GC completely? In that case you can’t go on to make a game that has a screen that says “Stopping the game world while managing memory…” aka “Loading…” nor can you be managing memory for LuaJIT, because that would be ironic.

  16. Wasn’t aware of the API GC in c++ 11 but if optional no real point.
    Your comment does remind me of the one flaw in biological evolution of the survival of the fittest, which is not the same as the best solution and so it is with languages and compilers. If something is just good enough, then often it grows in popularity beyond potentialy better solutions. JavaScript and c/c++ have typically eclipsed potentially better solutions as both do the job well enough to survive – that’s evolution and boils down to right time and place!

  17. @HerbSutter: “GC’d object graphs are antithetical to performance” -> I don’t see what “GC’d” has to do with the performance of object graphs vs arrays. Object graphs have poor data locality whether they’re GCed or not.

    “you can still get arrays but you’re fighting against the system” -> How are you fighting the system? Arrays and value types are first-class in .NET and popular .NET languages. The syntax, semantics and performance characteristics are the same. If using first-class citizens of the runtime and language is fighting the system then worse should be said about automatic pointers, memory pools and such techniques in C++.

    “control over hold/cold data separation and cache line layouts both of which are hard in managed environments that often don’t give you the control you need to specify things like alignment directly” -> .NET offers the same control over data layout as C++. Alignement is certainly a valid example where things are easier in C++ but it’s not exactly rocket science to do it in managed code either.

    The article applied to Javascript and I find that generalizing to all GC’d/managed environments is unfair. .NET and C# were clearly designed with performance and low-level control in mind unlike Javascript, and save from writing intrinsics or inline assembly there are precious few cases where dropping down to native code is necessary. It’s entirely possible to avoid GC by managing object pools and arrays exactly like it’s done in C++, and it’s not “fighting the system” any more than trying to do the same in C++.

  18. @Zeckul: Alas, ‘t’ain’t so. Quick example — given a class MyClass { }, try an array of MyClass in C++ and C#. Are they really exactly the same, including memory layout which was a major aspect of the point? Quick read: http://stackoverflow.com/questions/6943229/c-sharp-equivalent-of-c-vector-with-contiguous-memory .

    Longer answer follows:

    Yes, the emphasis is on “graphs” which popular GC-based languages are built around and encourage. Having said that, GC’d object graphs add an additional layer of performance overhead because they need to be traversed by the system, which adds extra memory operations and in some cases contention with program threads/cores.

    You can get array allocation, but have to fight the system’s natural way of working and don’t get full support — for example, common limitations/gotchas of arrays in managed languages include that you can only use them with true contiguity with a subset of types (typically fundamental/value types and arrays of big-Oh Objects aren’t contiguous but are really arrays of references), you can’t make use of contiguity unless you pin (that’s a major performance penalty), you can only make arrays up to a 32-bit index size, multidimensional arrays are not contiguous, typically there’s no alignment control, and/or other limitations. BTW, I said “GC-based languages” in the previous paragraph because the language design itself often assumes a GC, making node-based allocation and GC semantics inherent in the language in places — you’re fighting those assumptions and that normal common-path way of working when you opt for arrays, that’s all.

    True (fully contiguous) arrays just aren’t used as much in managed code, whereas they’re the recommended default container ([] and std::vector) in C and C++ code. I haven’t done this experiment, but try counting the mentions of techniques that use arrays in books/articles about managed code vs. books/articles about native code. Note: If you want to try this experiment, be careful when you count, because types called “*Array*” in C# and Java are not always actually arrays in the contiguous sense we mean! — which is symptomatic of what I’m talking about.

  19. C# and Java are not the only languages with GC, there are quite a few that offer the same memory control that C and C++ do, besides GC.

    As mentioned, so far they failed to make a dent into then mainstream due to lack of corporation support, but that doesn’t mean we should now take C# and Java as examples of the only way to implement GC in system languages and its performance.

  20. @HerbSutter: thanks for clarifying your point. The equivalent of an array of MyClass in C# would be an array of MyClass* or MyClass& in C++; the equivalent of an array of MyClass in C++ would be an array of MyStruct in C#. Let’s not compare apple to oranges. .NET arrays and Lists are contiguous blobs of memory exactly like native arrays and std::vectors. It’s up to the programmer, in both cases, to make use of that contiguity for data locality as need be.

    I understand your point on different emphasis, but even then I find it unfair since, as I said, value types are first class citizens of .NET; they’re fully supported and easy to write and use. It’s not like value types were an esoteric feature (as could be said of many C++ features ;) ). For example, all geometric and math primitives in the XNA Framework are value types. If someone cares about performance and avoiding GC overhead it’s entirely feasible to rely heavily on value types and contiguous arrays in managed code, and it’s not particularly difficult or obscure code to write either. At any rate it’s certainly less obscure than most C++ code out there.

    “you can’t make use of contiguity unless you pin (that’s a major performance penalty)” -> You get the data locality performance benefits whether you pin or not, but granted, you can only use pointer arithmetic if you pin. That said, pinning only hurts performance if the array is small (LOH is never moved in memory) and if GC happens to run while it’s pinned; if you only pin for brief amounts of time as is idiomatic with the fixed statement, and do some manual memory management (object pools, value types etc) to avoid putting too much pressure on the GC, the impact can be kept minimal.

    “you can only make arrays up to a 32-bit index size” -> Not likely to be an issue even for most performance-sensitive programs. One can always allocate unmanaged memory if need be, exactly as they would in native code.

    “multidimensional arrays are not contiguous” -> What is called a “multidimensional array” in .NET *is* contiguous in memory ( http://stackoverflow.com/a/597790/154766 ), although array access are unfortunately not inlined by the CLR. Jagged arrays require an indirection per dimension, but so they do in C++ as well, i.e. an unsigned char** is two layers of indirection exactly like a C# byte[][].

    “typically there’s no alignment control” -> There’s no built-in wrapper for alignment control but nothing stops someone from allocating a chunk of unmanaged memory (using aligned malloc if need be) and using pointers in C#. While this is not particularly easy, I don’t find C++ to make things so much easier in that regard either.

    “True (fully contiguous) arrays just aren’t used as much in managed code, whereas they’re the recommended default container ([] and std::vector) in C and C++ code.” -> Strange, I thought [] and List were the recommended default containers in C# as well. If you browse MSDN samples for C# those are by far the most commonly used collections. But if managed code litterature makes heavier use of fancy data structures in .NET, perhaps it is simply because they’re more abundant and easier to use.

    Similar to how C++ lets you opt-in to use certain “heavyweight” features (virtual methods, exceptions), C# lets you opt-out of GC and type checking (value types, unsafe code). Certainly the defaults and emphasis are different, but let’s not underestimate the possibilities of managed code.

  21. @Zeckul: “The equivalent of an array of MyClass in C# would be an array of MyClass* or MyClass& in C++; the equivalent of an array of MyClass in C++ would be an array of MyStruct in C#. Let’s not compare apple to oranges.”

    “Array of MyClass” is exactly an apples-to-apples comparison that shows how the barrel of apples is much smaller on the managed side. In C++ you only need to resort to an array of * or & only when you want polymorphism, for fundamental reasons that would apply to any language. In C# you have to resort to its equivalent for nearly all types.

    You simply cannot do a true (contiguous) array of any old type in C# or Java. You can in C++. Yes, you can do it for all value types (which can be user-defined types at least in C#) but that’s a small and restricted subset of types that are not useful for all kinds of types — objects must be bitcopyable, you cannot even inherit, etc. They are first-class in a sense, but they are absolutely not general-purpose. When you want an array of arbitrary Objects (cap intentional), including boxed value types, you can’t get it contiguously.

    To get contiguity, you have to drop down to value types, which is very restrictive and which I correctly said is actively working against an environment and language built around an Object base class of nearly all types — including even boxed value types which can’t be held contiguously.

  22. @HerbSutter: I think that even in an application containing lots of performance-sensitive code, only a small fraction of types need being placed contiguously in arrays for fast access in tight loops. Moreoever these types are typically raw-data-like, i.e. numerical or geometric data for instance. In addition, implementation inheritance would typically not be used for such types anyway because it becomes difficult to understand their memory layout which is kind of the whole point. Note that value types can implement interfaces (and these methods can be called without boxing with proper use of generics).

    With that said, while you are right to say that value types are limited compared to reference types in .NET, these restrictions are not an issue in the concrete cases where value types are indeed required. I would be curious to see an example to the contrary.

  23. Yet I’ve NEVER experienced anything even remotely close to a GC slowdown on my Windows Phone. Or when trying out older WP7 phones or low memory WP8s.

  24. Pingback: HTML5SALON沙龙
  25. Hi Herb,

    This is the last email I have received from Sutter’s Mill. Do you close for the summer?

    I miss your emails so hope they start coming again soon!

    Thanks,

    Jan Arrol

  26. I’m sorry but I urge you to reconsider your appreciation of Crawford’s article, and if possible, post a correction mentioning what I now will, for I don’t have the publicity to correct the misinformation spread by Crawford. As apparent from my e-mail exchange with him (linked below), he is extremely ignorant on garbage collection, loves misinterpreting academic papers, and the claims in his article regarding garbage collection are clearly a result of misinterpretation of a linked paper; the article grossly misrepresents the results of the paper, whose value are I think dubious even if interpreted correctly.

    http://taylan.uni.cx:8080/webapps-js-gc/

    If you have difficulties with the above link, contact me via taylanbayirli at Google Mail and I will send you a copy.

    It disturbs me greatly that an article with this level of misinformation is nevertheless written this well, and enjoys the publicity and appreciation it does, please help correct this for the sake of the public’s information and education.

  27. Basically, it boils down to this:

    In case you really absolutely have to use JavaScript with its high overall performance impact in an environment without massive headroom for memory, memory throughput, CPU power, &c., by all means do so, but:

    Keep things as simple and flat as possible. Do not use higher abstraction idioms and automation patterns that by definition generate lots of heap allocations.

    Refrain from following the otherwise great suggestion to use immutable instances for avoiding shared immutable state.

    Just reuse preallocated instances of what you need, i.e. do your own flat memory management, and learn to live with all kinds of coroutine safety issues due to shared mutable state (it doesn’t take threads for that becoming an issue at all), and just generally behave as if you were writing C code for a micro controller with harsh memory limits.

    And by all means refrain from heavy manipulation of the DOM tree – instead switch things to being visible / invisible, and just set basic numeric attributes for changing their appearance – may be tricky sometimes, but this way the tree’s structure isn’t modified, which would be way more heavyweight.

    Also, restrict use of hash maps (including objects, as they are implemented in JS using them) – it’s best to initialize all complex data on startup and leave their structure alone from then on. Just setting values for existing keys is not that much of a problem, of course – without being allowed to do that, mutating instances of objects wouldn’t be possible as well…

    Just forget about the idea that you’re dealing with a platform that supports OOP, except for data structures that can be set up once in the beginning and just used from that point without changing the object graphs.

    It really is just a question of selecting an appropriate architecture, and refraining from doing the kind of UI projects this way that need all the flexibility and performance you get by writing native code.

    Due to the inherently lower degree of abstraction that results from the flat approach I am proposing, it is often less work to just write the UI with the right degree of abstraction in Objective C and C++ (and / or maybe Java / C#) for the three predominant mobile platforms, plus maybe a shared core in C++, than to jump through all the hoops that become necessary if you give up most tools for achieving a higher level of abstraction as described.

    The time saved by being able to correctly model state with immutable value representations and not having to chase race conditions caused by mutation from different parts of the call stack alone will probably more than justify doing several things twice, or even thrice :)

Comments are closed.