The Future of the C++ Language

Hello, world!  Today, I (Stephan T. Lavavej, library dev) would like to present one question and one Orcas bugfix.

 

First, the question: What is the future of C++?  Or, phrased crudely, does C++ have a future?  Will it grow and evolve, with programmers using it in new application domains and finding ways to use it more effectively?  Or will it stagnate, with programmers using it in fewer and fewer application domains until nothing new is being invented with it and it enters “maintenance mode” forever?  After C++’s explosive growth over nearly the last three decades, what is going to come next?

 

This question has a finite horizon.  No language can possibly be eternal, right?  (Although C is certainly making a good run for it.)  I don’t expect C++ to be vibrant in 2107, or even 2057.  50 years is an almost incomprehensible span of time in the computer industry; the transistor itself is turning 60 years old this year.  So when I ask, “what is the future of C++?”, I’m really asking about the next 10, 20, and 30 years.

 

Here’s how I see it.  First, consider C++’s past.  As it happens, Bjarne Stroustrup recently released an excellent paper covering C++’s recent history, “Evolving a language in and for the real world: C++ 1991-2006”, at http://research.att.com/~bs/hopl-almost-final.pdf .  There’s also a wonderful 1995 interview with Alexander Stepanov at http://stepanovpapers.com/drdobbs-interview.html which explains C++’s machine model.

 

C++’s machine model has a relentless focus on performance, for several reasons.  Being derived from C, which was “fat free”, is one reason – in the realm of performance, C++ has never had to lose weight.  It’s just had to avoid gaining weight.  Additions to C++ have always been structured in such a way as to be implementable in a maximally efficient manner, and to avoid imposing costs on programmers who don’t ask for them.  (As the Technical Report on C++ Performance, now publicly available at http://standards.iso.org/ittf/PubliclyAvailableStandards/c043351_ISO_IEC_TR_18015_2006(E).zip , explains, exception handling can be implemented with the “table” approach, which imposes minimal run-time overhead on code that doesn’t actually throw.  VC uses the “code” approach on x86 because of historical reasons, although it uses the “table” approach on x64 and IA-64.)  Historically, C++ ran on very small and slow machines that couldn’t bear any unnecessary costs.  And now, C++ is used to tackle huge problems where performance is critical, so unnecessary costs are still unthinkable!

 

Aside from the elevator controllers and supercomputers, does performance still matter for ordinary desktops and servers?  Oh yes.  Processors have finally hit a brick wall, as our Herb Sutter explained in 2005 at http://gotw.ca/publications/concurrency-ddj.htm .  The hardware people, who do magical things with silicon, have encountered engineering limitations that have prevented consumer processors from steadily rising in frequency as they have since the beginning of time.  Although our processors aren’t getting any slower, they’re also not getting massively faster anymore (at least, barring some incredible breakthrough).  And anyways, there isn’t plenty of room at the bottom anymore.  Our circuits are incredibly close to the atomic level, and atoms aren’t getting any smaller.  The engineering limit to frequency has simply arrived before the physical limit to circuitry.  Caches will continue to get larger for the foreseeable future, which is nice, but having a cache that’s twice as large isn’t as nice as running everything at twice the frequency.

 

As programmers, we are faced with a future that looks radically different from what we’re used to: the processors we have today are about as fast as we will ever have.  The computer industry undergoes constant change, of course, but we rather liked the kind of change that made our programs run twice as fast every couple of years with no extra work on our part.

 

Undaunted, the hardware engineers have begun putting multiple cores in each processor, which is actually increasing overall performance quite nicely.  (I’d sure like to have a quad-core machine at work!)  But not everything is as embarrassingly parallel as compiling.  Single-core performance still matters.  And the problems that we, as programmers, are asked to solve are getting bigger every year, as they always have.

 

Therefore, I say that C++ is uniquely positioned to weather this performance storm.  Other languages will continue to find uses in application domains that aren’t performance-critical, or that are embarrassingly parallel.  But whenever the speed at which an individual core crunches stuff matters, C++ will be there.  (For example, 3D games.  When Halo Infinity is released in 2027 – and yes, I totally just made that up – I fully expect it to be written in C++.)

 

Among C++0x’s biggest core language changes will be variadic templates, concepts, and rvalue references.  The first two will make writing templates a lot more fun.  That’s great, because templates are a powerful way to produce highly efficient code.  And the third will address one of the flabbiest areas in C++03 – its tendency to make copies of values.  (Things that have value semantics are great – unnecessary copies aren’t.)  By eliminating unnecessary copies through “move semantics”, rvalue references will make value-heavy code, like any code that uses the STL, significantly faster.  The future is bright!

 

Why am I thinking about performance?  Well, because one of my fixes has reduced the performance of the Standard Library.  Before you scream in agony, let me explain…

 

In my first VCBlog post, I mentioned that I was working on something which has since been checked into Orcas (VC9).  I call it The Swap Fix.  To recap, Visual Studio 2005 (VC8) introduced new iterator debugging and iterator checking features.  Iterator debugging, enabled by _HAS_ITERATOR_DEBUGGING, performs powerful correctness verification.  Iterator checking, enabled by _SECURE_SCL, performs minimal checks that serve as a last line of security defense.  For example, _SECURE_SCL will terminate a program that triggers a heap overrun with a vector iterator.

 

All that is explained by MSDN documentation.  The story behind this is interesting.  The _HAS_ITERATOR_DEBUGGING functionality was provided by Dinkumware, the company that licenses their most triumphant implementation of the Standard Library for inclusion in Visual Studio.  The _SECURE_SCL functionality was added by Microsoft, in order to improve the security of programs running on Windows.  In order to perform their checks, both _HAS_ITERATOR_DEBUGGING and _SECURE_SCL make iterators contain additional data members, such as pointers to their parent containers.  _HAS_ITERATOR_DEBUGGING, because it is enabled by default in debug mode (and not obtainable in release mode), also builds singly linked lists that allow containers to refer to all of their iterators.  This is expensive performance-wise, but performance is not critical in debug mode, and this enables excellent checks.

 

_SECURE_SCL, because it is enabled by default in release mode, strives to impose minimal performance penalties.  Therefore, when it is enabled, although iterators have pointers back to their containers, containers don’t have pointers to their iterators.  (Updating “iterator lists” is too time-consuming for release mode.)

 

Now, VC8 RTM/SP1 had a bug when _HAS_ITERATOR_DEBUGGING was disabled and _SECURE_SCL was enabled (e.g. the default for release mode).  When you have persistent iterators into two containers, and then swap the containers, the Standard requires that the iterators remain valid (23.1/10).  Unfortunately, the parent pointers that _SECURE_SCL added to iterators were broken by such a swap.  (The containers being swapped have no way to find the iterators which point into them.)  Dinkumware’s _HAS_ITERATOR_DEBUGGING is immune to this problem since it can walk the iterator lists and update all of the parent pointers.  This option is not available to _SECURE_SCL.

 

In order to fix this conformance bug, The Swap Fix in Orcas makes every Standard container own an additional dynamically allocated object, imaginatively called “the aux object”.  Each container holds a pointer to its aux object, which holds a pointer back to the container.  Each iterator, instead of holding a pointer directly to its parent container, now holds a pointer to its parent container’s aux object.  It’s true: everything can be solved by an extra level of indirection.  When containers are swapped, they also swap their aux objects.  This allows the containers to “tell” their iterators their current location, without having to know where they all are.  The result is that VC9 will be conformant even under _SECURE_SCL.

 

The performance issue is that the aux object, while unavoidable (without “pimpling” the containers, which would probably be even more expensive for performance), is not free.  Each Standard container is now larger because it has to hold a pointer to its aux object.  The aux object has to be dynamically allocated, occupying more space and taking more time.  And _SECURE_SCL has perform a double indirection when going from an iterator to its parent container.  I’ve measured the cost of the double indirection, and it is nontrivial: programs that use iterators in deeply nested loops may run at half the speed as before.  (In general, only hand-written loops will be significantly affected.  The Standard algorithms perform checking once and then “uncheck” their arguments for increased speed.  It’s yet another reason to avoid hand-written loops!)  Most programs should not experience noticeable performance changes because of this fix, and some programmers’ lives will be made easier because of the increased conformance, but other programmers will have to deal with the fallout of The Swap Fix.  That’s why I write it with capital letters.

 

This is something to keep in mind: although performance is important, it is not all-important.  Correctness and security trump performance.  _SECURE_SCL increases security, and this fix is necessary to restore correctness.  The performance in VC8 was an illusion, since it was obtained at the cost of correctness.  Orcas’s performance will reflect the true cost of _SECURE_SCL.  As before, programs will be able to turn off _SECURE_SCL in order to extract maximum performance.  How to disable _SECURE_SCL (and _HAS_ITERATOR_DEBUGGING) properly might be a topic for a future blog post – it’s easy to do incorrectly, and we’re thinking about ways to make this process more robust in Orcas + 1.

 

STL