The Concurrency Runtime and Visual C++ 2010: Rvalue References

Last time, we looked at how to use the decltype keyword in Visual C++ 2010 to deduce the type of an expression (see The Concurrency Runtime and Visual C++ 2010: The decltype Type Specifier). This week we’ll look at rvalue references, a Visual C++ language feature that can help you to further improve the performance of your applications that use the Concurrency Runtime.

Rvalue References at a Glance

Before we look at the Concurrency Runtime, consider the following basic program that inserts a std::wstring object at the back of a std::vector object.

    1: // copy-construction.cpp
    2: // compile with: /EHsc
    3: #include <vector>
    4: #include <string> 
    5:  
    6: int wmain() 
    7: {
    8:    std::wstring s(L"Hello, World!");
    9:  
   10:    std::vector<std::wstring> v;
   11:    v.push_back(s);
   12:  
   13:    // ...
   14: }

The vector<T> constructor invokes the copy constructor of T, which in this example creates a copy of the string. Now consider what happens when the following conditions hold:

1. The variable s is not referenced elsewhere in wmain.

2. You need to insert many strings into the vector container.

When you write parallel code, this pattern appears often because you are likely processing thousands or millions of objects (or more!). Because s is not referenced elsewhere in the program, you don’t require a copy to be made. And when you create many strings, you are allocating time and memory for each copy operation. As the size of type T grows, the time required by each individual copy operation increases. This can cause a performance bottleneck over the lifetime of your application.

Visual C++ 2010 introduces the concept of rvalue references which enable two things: perfect forwarding, or the ability to forward arbitrary arguments to another function, and move semantics, or the transfer of resources (such as memory) from one object to another. I won’t go into the nitty-gritty of how perfect forwarding and move semantics work here; for in-depth details check out Rvalue Reference Declarator: && on MSDN and Stephan’s post Rvalue References: C++0x Features in VC10, Part 2 on the Visual C++ Team Blog (note that Stephan’s article discusses an early implementation of rvalue references, the rules for which have been changed in the final version of Visual C++ 2010).

You can easily take advantage of libraries that utilize move semantics by using other library types that also support move semantics (such as wstring) or by writing a move constructor for your type. For example, the std::vector::push_back member function takes either a const reference (const T & ) or an rvalue reference (T && ) to the element to be added to the container. When you pass an rvalue reference to a type such as wstring you get move construction for free because wstring provides a move constructor. The following shows how to modify the previous example to take advantage of move semantics to increase memory performance:

    1: // move-construction.cpp
    2: // compile with: /EHsc
    3: #include <vector>
    4: #include <string> 
    5:  
    6: int wmain() 
    7: {
    8:    std::vector<std::wstring> v;
    9:    v.push_back(std::wstring(L"Hello, World!"));
   10:  
   11:    // ...
   12: }

In this example, the parameter to push_back is an rvalue reference because it is a temporary value that does not persist beyond the expression that uses it. Because no other part of the program can access this temporary, it is safe to move its contents to the wstring object that is stored in the vector container.

If you replace type wstring with your own custom type, you can take advantage of move semantics by writing a move constructor for your type. We’ll look at an example that uses a custom type that enables move semantics next.

Rvalue References and the Concurrency Runtime

If you’re using the Concurrency Runtime Sample Pack, you may have noticed the new container types concurrent_unordered_set, concurrent_unordered_multiset, concurrent_unordered_map, and concurrent_unordered_multimap in version 0.33. Much like std::vector::push_back, these container types support perfect forwarding and move construction during insert operations. Version 0.33 of the Sample Pack also introduces the parallel algorithms parallel_reduce, parallel_transform, and parallel_sort, which use move semantics.

To see the advantages that move semantics provide, let’s look at an example that computes the time that it takes to use parallel_sort to sort a collection of remote_integer objects (I borrowed the remote_integer class from Stephan’s post). I modified Stephan’s example to use Concurrency::combinable objects to count the number of copy and move operations that occur in the remote_integer class (the combinable class enables us to perform this count concurrently and in a lock-free manner). For now, pretend that the code in the #ifdef MOVABLE section hasn’t been written yet.

    1: // parallel-sorting.cpp
    2: // compile with (move semantics disabled): /EHsc
    3: // compile with (move semantics enabled):  /EHsc /D "MOVABLE"
    4: #include "ppl_extras.h"
    5: #include <Windows.h>
    6: #include <vector>
    7: #include <random>
    8: #include <algorithm>
    9: #include <iostream>
   10:  
   11: #include <agents.h>
   12: #include <numeric>
   13:  
   14: using Concurrency::combinable;
   15: using Concurrency::samples::parallel_sort;
   16: using namespace std;
   17:  
   18: // Calls the provided work function and returns the number of milliseconds 
   19: // that it takes to call that function.
   20: template <class Function>
   21: __int64 time_call(Function&& f)
   22: {
   23:    __int64 begin = GetTickCount();
   24:    f();
   25:    return GetTickCount() - begin;
   26: }
   27:  
   28: // The remote_integer class holds a pointer to a heap-allocated 
   29: // resource (an int value).
   30: // To demonstrates the difference between copy and move construction,
   31: // this class holds pointers to two Concurrency::combinable objects. 
   32: // One combinable object counts calls to the copy constructor and copy
   33: // assignment operator; the other counts call to the move constructor and
   34: // move assignment operator.
   35: // Move semantics are disabled by default. Compile with /D "MOVABLE"
   36: // to enable move semantics.
   37: class remote_integer {
   38: public:
   39:    // Default constructor.
   40:    remote_integer() : _p(nullptr) {
   41:    }
   42:  
   43:    // Unary constructor.
   44:    explicit remote_integer(const int n) : _p(new int(n)) {
   45:    }
   46:  
   47:    // Copy constructor.
   48:    remote_integer(const remote_integer& other) {
   49:       if (other._p) {
   50:          _p = new int(*other._p);
   51:       } else {
   52:          _p = nullptr;
   53:       }
   54:       increment(_copy_calls);
   55:    }
   56:    
   57:    // Copy assignment operator.
   58:    remote_integer& operator=(const remote_integer& other) {
   59:       remote_integer(other).swap(*this);
   60:       increment(_copy_calls);
   61:       return *this;
   62:     }
   63:       
   64: #ifdef MOVABLE
   65:    // Move constructor.
   66:    remote_integer(remote_integer&& other) {      
   67:       _p = other._p;
   68:       other._p = nullptr;
   70:       increment(_move_calls);
   71:    }
   72:    
   73:    // Move assignment operator.
   74:    remote_integer& operator=(remote_integer&& other) {
   75:       remote_integer(move(other)).swap(*this);
   76:       increment(_move_calls);
   77:       return *this;
   78:    }
   79: #endif // #ifdef MOVABLE
   80:  
   81:    // Destructor.
   82:    ~remote_integer() {
   83:       delete _p;
   84:    } 
   85:  
   86:    // Retrieves the underlying value.
   87:    int get() const {
   88:       return _p ? *_p : 0;
   89:    }
   90:  
   91:    void swap(remote_integer& other) {
   92:       std::swap(_p, other._p);
   93:    }
   94:       
   95:    // Counts calls to the copy constructor and copy assignment operator.
   96:    static combinable<size_t>* _copy_calls;
   97:    // Counts calls to the move constructor and move assignment operator.
   98:    static combinable<size_t>* _move_calls;
   99:  
  100: private:
  101:    // Increments the local value of the provided combinable object.
  102:    void increment(combinable<size_t>* c)
  103:    {
  104:       if (c != nullptr) {
  105:          c->local()++;
  106:       }
  107:    }
  108:    int * _p;
  109: };
  110:  
  111: // Define operator< for use with parallel_sort.
  112: bool operator<(const remote_integer& a, const remote_integer& b) {
  113:    return a.get() < b.get();
  114: }
  115:  
  116: combinable<size_t>* remote_integer::_copy_calls  = nullptr;
  117: combinable<size_t>* remote_integer::_move_calls  = nullptr;
  118:  
  119: int main()
  120: {  
  121:    // Set up the remote_integer combinable objects.
  122:    combinable<size_t> copy_calls, move_calls;
  123:    remote_integer::_copy_calls  = &copy_calls;
  124:    remote_integer::_move_calls  = &move_calls;
  125:  
  126:    // Create a vector object that contains ten million 
  127:    // remote_integer objects that each has a random value.
  128:    vector<remote_integer> elements(10000000);
  129:    mt19937 gen(42);
  130:    std::generate(elements.begin(), elements.end(), [&gen]() {
  131:       return remote_integer(gen());
  132:    });
  133:    
  134:    // Compute the time that it takes to sort the elements.
  135:    auto elapsed = time_call([&elements] {
  136:       parallel_sort(elements.begin(), elements.end());
  137:    });
  138:  
  139:    // Print the elapsed time.
  140:    wcout << L"Took " << elapsed << L" ms." << endl;
  141:    
  142:    // Print the count of copy and move calls.
  143:    wcout << L"Copy construction:  " 
  144:       <<  copy_calls.combine(std::plus<size_t>()) << endl;
  145:    wcout << L"Move construction:  " 
  146:       <<  move_calls.combine(std::plus<size_t>()) << endl; 
  147: }

When compiled as-is (in other words, with the #ifdef MOVABLE block disabled), this example produces the following sample output on my quad-core machine:

Took 17360 ms.

Copy construction: 304059508

Move construction: 0

Don’t get me wrong, this isn’t bad at all. It totally beats std::sort and fully utilizes my CPU resources (try it out for yourself and open up Windows Task Manager). But now enable move construction (either by removing the lines that begin with #ifdef and #endif or by compiling with /D “MOVABLE” ). When I recompile and run the program, I get the following output:

Took 3797 ms.

Copy construction: 0

Move construction: 304776685

Don’t confuse the performance benefits of move semantics with parallel code. Move semantics on its own has nothing to do with parallelism. But in this example I combined the benefits of move semantics with a parallel algorithm to get awesome performance gains!

Sidebar – Stephan from the Visual C++ team clued me in to one additional benefit to using rvalue references in your parallel code: Copy construction that requires dynamic memory allocation (such as HeapAlloc) causes lock contention. Because move semantics avoids dynamic memory allocation you can avoid this lock contention and get even better performance gains! – End Sidebar

For illustration, this example uses the parallel_sort algorithm. If you’re interesting in learning more about the different kinds of parallel sorting algorithms that are provided in the Sample Pack, including when to use each one, see Vinod’s posts Sorting in PPL and How to pick your parallel sort? If you’d like to read more details about how to provide move semantics in your code, see How to: Write a Move Constructor.

Not only does move construction help your programs to run more efficiently, but it also enables you to work with resources that are not copy-constructible. For example, you can insert std::unique_ptr objects into concurrent_unordered_set, concurrent_unordered_multiset, concurrent_unordered_map, and concurrent_unordered_multimap objects. You can imagine other resource abstractions, such as network or database connections, that do not allow copy construction.

To summarize, rvalue references can help you improve memory performance by minimizing the number of memory allocation, deallocation, and copy operations. However, you likely won’t notice performance improvements unless you make use of this feature many times over the lifetime of your application. The greater the size of your data payloads or the number of move operations you perform, the greater the improvements you’ll likely see.

That wraps it up for some of the language improvements in Visual C++ 2010. Next time we’ll talk about a library feature that makes it easier to work with exceptions that are thrown by the bodies of concurrent tasks.