Last time, we looked at how to use the decltype keyword in Visual C++ 2010 to deduce the type of an expression (see The Concurrency Runtime and Visual C++ 2010: The decltype Type Specifier). This week we’ll look at rvalue references, a Visual C++ language feature that can help you to further improve the performance of your applications that use the Concurrency Runtime.
Rvalue References at a Glance
The vector<T> constructor invokes the copy constructor of T, which in this example creates a copy of the string. Now consider what happens when the following conditions hold:
1. The variable s is not referenced elsewhere in wmain.
2. You need to insert many strings into the vector container.
When you write parallel code, this pattern appears often because you are likely processing thousands or millions of objects (or more!). Because s is not referenced elsewhere in the program, you don’t require a copy to be made. And when you create many strings, you are allocating time and memory for each copy operation. As the size of type T grows, the time required by each individual copy operation increases. This can cause a performance bottleneck over the lifetime of your application.
Visual C++ 2010 introduces the concept of rvalue references which enable two things: perfect forwarding, or the ability to forward arbitrary arguments to another function, and move semantics, or the transfer of resources (such as memory) from one object to another. I won’t go into the nitty-gritty of how perfect forwarding and move semantics work here; for in-depth details check out Rvalue Reference Declarator: && on MSDN and Stephan’s post Rvalue References: C++0x Features in VC10, Part 2 on the Visual C++ Team Blog (note that Stephan’s article discusses an early implementation of rvalue references, the rules for which have been changed in the final version of Visual C++ 2010).
You can easily take advantage of libraries that utilize move semantics by using other library types that also support move semantics (such as wstring) or by writing a move constructor for your type. For example, the std::vector::push_back member function takes either a const reference (const T&) or an rvalue reference (T&&) to the element to be added to the container. When you pass an rvalue reference to a type such as wstring you get move construction for free because wstring provides a move constructor. The following shows how to modify the previous example to take advantage of move semantics to increase memory performance:
In this example, the parameter to push_back is an rvalue reference because it is a temporary value that does not persist beyond the expression that uses it. Because no other part of the program can access this temporary, it is safe to move its contents to the wstring object that is stored in the vector container.
If you replace type wstring with your own custom type, you can take advantage of move semantics by writing a move constructor for your type. We’ll look at an example that uses a custom type that enables move semantics next.
Rvalue References and the Concurrency Runtime
If you’re using the Concurrency Runtime Sample Pack, you may have noticed the new container types concurrent_unordered_set, concurrent_unordered_multiset, concurrent_unordered_map, and concurrent_unordered_multimap in version 0.33. Much like std::vector::push_back, these container types support perfect forwarding and move construction during insert operations. Version 0.33 of the Sample Pack also introduces the parallel algorithms parallel_reduce, parallel_transform, and parallel_sort, which use move semantics.
To see the advantages that move semantics provide, let’s look at an example that computes the time that it takes to use parallel_sort to sort a collection of remote_integer objects (I borrowed the remote_integer class from Stephan’s post). I modified Stephan’s example to use Concurrency::combinable objects to count the number of copy and move operations that occur in the remote_integer class (the combinable class enables us to perform this count concurrently and in a lock-free manner). For now, pretend that the code in the #ifdef MOVABLE section hasn’t been written yet.
When compiled as-is (in other words, with the #ifdef MOVABLE block disabled), this example produces the following sample output on my quad-core machine:
Took 17360 ms.
Copy construction: 304059508
Move construction: 0
Don’t get me wrong, this isn’t bad at all. It totally beats std::sort and fully utilizes my CPU resources (try it out for yourself and open up Windows Task Manager). But now enable move construction (either by removing the lines that begin with #ifdef and #endif or by compiling with /D “MOVABLE”). When I recompile and run the program, I get the following output:
Took 3797 ms.
Copy construction: 0
Move construction: 304776685
Don’t confuse the performance benefits of move semantics with parallel code. Move semantics on its own has nothing to do with parallelism. But in this example I combined the benefits of move semantics with a parallel algorithm to get awesome performance gains!
Sidebar – Stephan from the Visual C++ team clued me in to one additional benefit to using rvalue references in your parallel code: Copy construction that requires dynamic memory allocation (such as HeapAlloc) causes lock contention. Because move semantics avoids dynamic memory allocation you can avoid this lock contention and get even better performance gains! – End Sidebar
For illustration, this example uses the parallel_sort algorithm. If you’re interesting in learning more about the different kinds of parallel sorting algorithms that are provided in the Sample Pack, including when to use each one, see Vinod’s posts Sorting in PPL and How to pick your parallel sort? If you’d like to read more details about how to provide move semantics in your code, see How to: Write a Move Constructor.
Not only does move construction help your programs to run more efficiently, but it also enables you to work with resources that are not copy-constructible. For example, you can insert std::unique_ptr objects into concurrent_unordered_set, concurrent_unordered_multiset, concurrent_unordered_map, and concurrent_unordered_multimap objects. You can imagine other resource abstractions, such as network or database connections, that do not allow copy construction.
To summarize, rvalue references can help you improve memory performance by minimizing the number of memory allocation, deallocation, and copy operations. However, you likely won’t notice performance improvements unless you make use of this feature many times over the lifetime of your application. The greater the size of your data payloads or the number of move operations you perform, the greater the improvements you’ll likely see.
That wraps it up for some of the language improvements in Visual C++ 2010. Next time we’ll talk about a library feature that makes it easier to work with exceptions that are thrown by the bodies of concurrent tasks.