Feeds:
Posts
Comments

Archive for the ‘Effective Concurrency’ Category

As I mentioned earlier, part of my fall schedule is to give a repeat of this spring’s sold-out seminar: “High-Performance and Low-Latency C++” (October 9-11, London, UK).

I am still getting mails about whether there are alternative/additional European dates for this seminar. Unfortunately, the answer is still no, but since I’m getting inquiries about it let me repeat that part of the earlier post:

On October 9-11, I’ll be in London giving a one-time repeat of “High-Performance and Low-Latency C++” (course details page). This is the same as the public course I gave in Stockholm [in April]; because that course sold out, and I was coming to Europe again for Qt World Summit anyway, we decided to do a single repeat that same week, this time in London.

Notes: (1) Some of you have emailed me asking if there will be other dates/cities, and the answer is no, sorry, I do seminars very rarely and this is the last one I have time to do for the foreseeable future. So if you are interested then this is the one to attend. (2) Some of you have also emailed me to ask whether the seminar will be recorded, and the answer is again no, sorry, the organizers are not set up for that. However, you can find all of my past Effective Concurrency writing (on which parts of this course are based) freely available via this blog, just search for that phrase or use the category tag — there’s a book’s worth of free material written by me in individual-article form.

So, if you’re interested, I hope you’ll be able to attend this October, and I look forward to seeing many of you there.

Read Full Post »

I don’t get to Europe very often apart from ISO C++ standards meetings, but this spring I’ve been able to accept invitations for two English-language European events in the last week of April. If you’re interested in attending, please check out the links, and I look forward to meeting and re-meeting many of you there.

Tue-Thu Apr 25-27: High-Performance and Low-Latency C++ (Stockholm)

On April 25-27, I’ll be in Stockholm (Kista) giving a three-day seminar on “High-Performance and Low-Latency C++.” This contains updated and new material that reflects the latest C++ standards and compilers, with a focus to using modern C++11/14/17 effectively on modern hardware and memory architectures.

Note that the class size is limited to about 100, so that I’ll be able to interact with most attendees directly. Registration just opened recently and hasn’t been widely publicized yet, but today I was told that it’s already over 1/3 full. So if you or your colleagues might be interested in attending, please check out the link above; for group registrations, please contact Alfasoft directly.

Here is the summary, below; for a more detailed topic breakdown see the link above.

Description

Performance and efficiency are C++’s bread and butter, and they matter more than ever on modern hardware: In processors, single-threaded performance improvements are slowing down (unless your code is parallel); in Internet of Things, we are often asked to do more work with less hardware; and in cloud computing, processor/hardware time is often the major component of cost and so making code twice as efficient often means saving close to half the processing cost. Today, getting the highest performance and the lowest latency on modern hardware often means being aware of the hardware in ways that most other programming languages can’t – from hardware caches where simply arranging our data in the right order can give 50x speedups with otherwise identical code, to hardware parallelism where using parallel algorithms turns on high-performance parallel and vector processor hardware that otherwise sits idle.

Additionally, low latency increasingly matters at all scales: In user interfaces, low latency means responsive apps and productive users without the dreaded “wait…” donut; in financial trading, low latency regularly saves large amounts of cash; in embedded real-time systems, low latency is crucial to meeting deadlines and can even save lives. Today, this makes concurrency more important than ever, because it delivers two things: It hides latencies we have to deal with and cannot remove, from disk I/O latency to speed-of-light network latency; and it makes our code responsive by not introducing needless latencies of our own even when we’re not hiding someone else’s latency.

Goal

This intensive three day course will provide developers with the knowledge and skills required to write high-performance and low-latency code on today’s modern systems using modern C++11/14/17. During the training you’ll learn how to get the highest performance and the lowest latency on modern hardware in ways that are unique to C++, including how to arrange data to use hardware caches effectively, and how to use standard and your own custom-written parallel algorithms to harness high-performance parallel and vector processor hardware to compute results faster. You’ll also learn how to manage latency for responsive apps and for real-time systems, and techniques to hide the underlying latencies we have to deal with and cannot remove such as disk and network latency, and to make your own code responsive by not introducing needless latencies in your own code.

Sat Apr 29: ACCU closing keynote (Bristol)

Next, I’ll be heading to Bristol to catch the end of the ACCU 2017 conference, and give the closing talk on “Something(s) New in C++.” No, the title is not intentionally a tease; it’s just that I have several topics available, and I won’t be sure until about a month before the event which will be the best one to speak about. Here is the current abstract:

By the time the ACCU 2017 conference begins, C++17 is expected to be technically complete and in its final approval ballot. What comes next? Will C++ continue growing forever? Can C++ code be simplified? This is a brand-new talk of material I’ve never given before, in which I’ll present one (or more) of three proposals I’m personally working on to further improve C++ post-C++17. All follow a common theme – adding a strategic language and/or library feature to C++ that leads to significant, and sometimes dramatic, simplification of real-world C++ code. I’ll pick which one (or more) of those topics to present sometime in March.

What I can say is that, whichever topic it ends up being, it’ll be something you haven’t seen before that’s forward-looking and aimed directly toward making C++ code simpler and easier… and of course without compromising C++’s model of efficient machine-near abstraction.

I look forward to seeing many of you in Europe this spring.

Read Full Post »

Sarmad Asgher asked a variant of a perennial question:

I am implementing multi producer single consumer problem. I have shared variables like m_currentRecordsetSize which tells the current size of the buffer. I am using m_currentRecordsetSize in a critical section do i need to declare it as volatile.

If you’re in C or C++, and the variable is not already being protected by a mutex or similar, then you need to declare it atomic (e.g., if it’s an int, then atomic_int in C or atomic<int> in C++. Not volatile.

Also is there any article by you on this topic. Please do reply.

There is! See my article “volatile vs. volatile” for the difference and why C/C++ volatile has nothing to do with inter-thread communication.

Read Full Post »

It’s time for, not one, but two brand-new, up-to-date talks on the state of the art of concurrency and parallelism in C++. I’m going to put them together especially and only for C++ and Beyond 2012, and I’ll be giving them nowhere else this year:

  • C++ Concurrency – 2012 State of the Art (and Standard)
  • C++ Parallelism – 2012 State of the Art (and Standard)

And there’s a lot to tell. 2012 has already been a busy year for the pushing the boundaries of both “shipping-and-practical” and “proto-standard” concurrency and parallelism in C++:

  • In February, the spring ISO C++ standards meeting saw record attendance at 73 experts (normal is 50-55), and spent the full week primarily on new language and library proposals, with notable emphasis on the area of concurrency and parallelism. There was so much interest that I formed four Study Groups and appointed chairs: the largest on concurrency and parallelism (SG1, Hans Boehm), and three others on modules (SG2, Doug Gregor), filesystem (SG3, Beman Dawes), and networking (SG4, Kyle Kloepper).
  • Three weeks ago, we hosted another three-day face-to-face meeting for SG1 and SG4 – and at nearly 40 people the SG1 attendance rivaled that of a normal full ISO C++ meeting, with a who’s-who of the world’s concurrency and parallelism experts in attendance and further proposal presentations from companies like IBM, Intel, and Microsoft. There was so much interest that I had to form a new Study Group 5 for Transactional Memory (SG5), and appointed Michael Wong of IBM as chair.
  • Over the summer, we’ll all be working on updated proposals for the October ISO C++ meeting in Portland.

Things are heating up, and we’re narrowing down which areas to focus on.

I’ve spoken and written on these topics before. Here’s what’s different about these talks:

  • Brand new: This material goes beyond what I’ve written and taught about before in my Effective Concurrency articles and courses.
  • Cutting-edge current: It covers the best-practices state of the art techniques and shipping tools, and what parts of that are standardized in C++11 already (the answer to that one may surprise you!) and what’s en route to near-term standardization and why, with coverage of the latest discussions.
  • Mainstream hardware – many kinds of parallelism: What’s the relationship among multi-core CPUs, hardware threads, SIMD vector units (Intel SSE and AVX, ARM Neon), and GPGPU (general-purpose computation on GPUs, which I covered at C++ and Beyond 2011)? Which are most interesting, what technologies are available now, and what’s being considered for near-term standardization?
  • Blocking vs. non-blocking: What’s the difference between blocking and non-blocking styles, why on earth would you care, which kinds does C++11 support, and how are we looking at rounding it out in C++1y?
  • Task and data parallelism: What’s the difference between task parallelism and data parallelism, which kind of of hardware does each allow you to exploit, and why?
  • Work stealing: What’s the difference between thread pools and work stealing, what are the major flavors of work stealing, which of these (if any) does C++11 already support and is already shipping on some advanced commercial C++ compilers today (this answer will likely surprise you), and what needs to be done in the next round for a complete state-of-the-art parallelism story in C++1y?

The answers all matter to you – even the ones not yet in the C++ standard – because they are real, available in shipping products, and affect how you design your software today.

This will be a broad and deep dive. At C++ and Beyond 2011, the attendees (audience!) included some of the world’s leading experts on parallelism and compilers. At these sessions of C&B 2012, I expect anyone who wasn’t personally at the SG1 meeting this month, even world-class experts, will learn something new in these talks. I certainly did, and that’s why I’m motivated to turn the information into talks and share. This isn’t just cool stuff – it’s important and useful in production code today.

I hope to see many of you at C&B 2012. I’m excited about these topics, and about Scott’s and Andrei’s new material – you just can’t get this stuff anywhere else.

Asheville is going to be blast. I can’t wait.

Herb


P.S.: I haven’t seen this much attention and investment in C++ since last century – C++ conferences at record numbers, C++ compiler investments by the biggest companies in the industry (e.g., Clang), and much more that we’ve seen already…

… and a little bird tells me there’s a lot more major C++ news coming this year. Stay tuned, and fasten your seat belts. 2012 ain’t done yet, not by a long shot, and I’ll be able to say more about C++ as a whole (besides the specific topics mentioned above) for the first time at C&B in August. I hope to see you there.

FYI, C&B is already over 60% full, and early bird registration ends this Friday, June 1 – so register today.

Read Full Post »

This month’s Effective Concurrency column, Prefer structured lifetimes – local, nested, bounded, deterministic, is now live on DDJ’s website.

From the article:

Where possible, prefer structured lifetimes: ones that are local, nested, bounded, and deterministic. This is true no matter what kind of lifetime we’re considering, including object lifetimes, thread or task lifetimes, lock lifetimes, or any other kind. …

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns:

The Pillars of Concurrency (Aug 2007)

How Much Scalability Do You Have or Need? (Sep 2007)

Use Critical Sections (Preferably Locks) to Eliminate Races (Oct 2007)

Apply Critical Sections Consistently (Nov 2007)

Avoid Calling Unknown Code While Inside a Critical Section (Dec 2007)

Use Lock Hierarchies to Avoid Deadlock (Jan 2008)

Break Amdahl’s Law! (Feb 2008)

Going Superlinear (Mar 2008)

Super Linearity and the Bigger Machine (Apr 2008)

Interrupt Politely (May 2008)

Maximize Locality, Minimize Contention (Jun 2008)

Choose Concurrency-Friendly Data Structures (Jul 2008)

The Many Faces of Deadlock (Aug 2008)

Lock-Free Code: A False Sense of Security (Sep 2008)

Writing Lock-Free Code: A Corrected Queue (Oct 2008)

Writing a Generalized Concurrent Queue (Nov 2008)

Understanding Parallel Performance (Dec 2008)

Measuring Parallel Performance: Optimizing a Concurrent Queue (Jan 2009)

volatile vs. volatile (Feb 2009)

Sharing Is the Root of All Contention (Mar 2009)

Use Threads Correctly = Isolation + Asynchronous Messages (Apr 2009)

Use Thread Pools Correctly: Keep Tasks Short and Nonblocking (Apr 2009)

Eliminate False Sharing (May 2009)

Break Up and Interleave Work to Keep Threads Responsive (Jun 2009)

The Power of “In Progress” (Jul 2009)

Design for Manycore Systems (Aug 2009)

Avoid Exposing Concurrency – Hide It Inside Synchronous Methods (Oct 2009)

Prefer structured lifetimes – local, nested, bounded, deterministic (Nov 2009)

Read Full Post »

This month’s Effective Concurrency column, Avoid Exposing Concurrency – Hide It Inside Synchronous Methods, is now live on DDJ’s website.

From the article:

You have a mass of existing code and want to add concurrency. Where do you start?

Let’s say you need to migrate existing code to take advantage of concurrent execution or scale on parallel hardware. In that case, you’ll probably find yourself in one of these two common situations, which are actually more similar than different:

  • Migrating an application: You’re an application developer, and you want to migrate your existing synchronous application to be able to benefit from concurrency.
  • Migrating a library or framework: You’re a developer on a team that produces a library or framework used by other teams or external customers, and you want to let the library take advantage of concurrency on behalf of the application without requiring application code rewrites.

You have a mountain of opportunities and obstacles before you. Where do you start?

I hope you enjoy it. Finally, here are links to previous Effective Concurrency columns:

The Pillars of Concurrency (Aug 2007)

How Much Scalability Do You Have or Need? (Sep 2007)

Use Critical Sections (Preferably Locks) to Eliminate Races (Oct 2007)

Apply Critical Sections Consistently (Nov 2007)

Avoid Calling Unknown Code While Inside a Critical Section (Dec 2007)

Use Lock Hierarchies to Avoid Deadlock (Jan 2008)

Break Amdahl’s Law! (Feb 2008)

Going Superlinear (Mar 2008)

Super Linearity and the Bigger Machine (Apr 2008)

Interrupt Politely (May 2008)

Maximize Locality, Minimize Contention (Jun 2008)

Choose Concurrency-Friendly Data Structures (Jul 2008)

The Many Faces of Deadlock (Aug 2008)

Lock-Free Code: A False Sense of Security (Sep 2008)

Writing Lock-Free Code: A Corrected Queue (Oct 2008)

Writing a Generalized Concurrent Queue (Nov 2008)

Understanding Parallel Performance (Dec 2008)

Measuring Parallel Performance: Optimizing a Concurrent Queue (Jan 2009)

volatile vs. volatile (Feb 2009)

Sharing Is the Root of All Contention (Mar 2009)

Use Threads Correctly = Isolation + Asynchronous Messages (Apr 2009)

Use Thread Pools Correctly: Keep Tasks Short and Nonblocking (Apr 2009)

Eliminate False Sharing (May 2009)

Break Up and Interleave Work to Keep Threads Responsive (Jun 2009)

The Power of “In Progress” (Jul 2009)

Design for Manycore Systems (Aug 2009)

Avoid Exposing Concurrency – Hide It Inside Synchronous Methods (Oct 2009)

Read Full Post »