• Home
  • Welcome to the Jungle
  • Elements of Modern C++ Style
  • About

Sutter’s Mill

Herb Sutter on software, hardware, and concurrency

Feeds:
Posts
Comments
« Going Native Sessions Online
VC++11 Beta Available, Supported For Production Code »

James Hamilton on reliability

2012-02-26 by Herb Sutter

Don’t trust hardware or software; then you can build trustworthy hardware and software.

James Hamilton on how to write reliable software in a world where anything that can fail, will fail.

Posted in Hardware, Software Development | 2 Comments

2 Responses

  1. on 2012-02-26 at 9:08 pm Pulu

    The link appears to be broken. Ironically.


  2. on 2012-02-28 at 1:53 am paercebal

    Tested the 2012-02-27 and 2012-02-28 (Paris time) and the link works.

    Now, after reading the article, I understand the risks (and thus the means to avoid them) related to critical systems like satellites and other space modules, but for a normal application like the ones most of us work on, the prospect of double/triple/checksum testing anything from hardware to software is daunting.

    Quoting the text: “At scale, error detection and correction at lower levels fails to correct or even detect some problems. Software stacks above introduce errors. Hardware introduces more errors. Firmware introduces errors. Errors creep in everywhere and absolutely nobody and nothing can be trusted [...] Upon deep investigation at some customer sites, we found the software was fine, but each customer had one, and sometimes several, latent data corruptions on disk. Perhaps it was introduced by hardware, perhaps firmware, or possibly software”

    I just assumed that hardware corruption (one of my HD just died recently, after a long data corrupting agony, so I tasted that bitter medicine) was a “fact of life” and that I had better things to do (like, correcting my own bugs) than trying to protect the customer from hardware faults or others things I had no control upon.

    C++ has multiple virtues, but immunization from hardware problems, electricity interruption, or even alien invasion are not among them (or perhaps, Alien invasion would be ok, if done on Mac by Jeff Goldblum).

    And I still believe this (if not, I would be panicking right now).

    Still interesting to know because of how it could very well apply on different components of the same “application” working together (like a rich client, a server, its plugins, all forming a large application, as far as the client is concerned)… Food for thoughts…



Comments are closed.

  • Tweets

    • GotW #6a: Const-Correctness, Part 1: const and mutable have been in C++ for many years. How well do you know w... bit.ly/1a7xChI 17 hours ago
    • GotW #5 Solution: Overriding Virtual Functions: Virtual functions are a pretty basic feature, but they occasio... bit.ly/16PIrIW 17 hours ago
    • GotW #5: Overriding Virtual Functions: Virtual functions are a pretty basic feature, but they occasionally har... bit.ly/14oTLHx 2 days ago
    Follow @herbsutter
  • Popular

    • GotW #5 Solution: Overriding Virtual Functions
    • GotW #6a: Const-Correctness, Part 1
    • GotW #4 Solution: Class Mechanics
  • Categories

    • Apple
    • C# / .NET
    • C++
    • Cloud
    • Concurrency
    • Effective Concurrency
    • Friday Thoughts
    • GotW
    • Hardware
    • Java
    • Microsoft
    • Opinion & Editorial
    • Reader Q&A
    • Software Development
    • Talks & Events
    • Uncategorized
    • Web

Blog at WordPress.com.

Theme: Customized MistyLook by WPThemes.


Follow

Get every new post delivered to your Inbox.

Join 1,414 other followers

Powered by WordPress.com