Sorting by indices, part 1


Okay, now we're going to start using the apply_permutation function that we beat to death for first part of this week.

Suppose you are sorting a collection of objects with the property that copying and moving them is expensive. (Okay, so in practice, moving is not expensive, so let's say that the object is not movable at all.) You want to minimize the number of copies.

The typical solution for this is to perform an indirect sort: Instead of moving expensive things around, use an inexpensive thing (like as an integer) to represent the expensive item, and sort the inexpensive things. Once you know where everything ends up, you can move the expensive things just once.

template<typename Iter, typename Compare>
void sort_minimize_copies(Iter first, Iter last, Compare cmp)
{
 using Diff = std::iterator_traits<Iter1>::difference_type;
 Diff length = last - first;
 std::vector<Diff> indices(length);
 std::iota(indices.begin(), indices.end(), static_cast<Diff>(0));
 std::sort(indices.begin(), indices.end(),
    [&](Diff a, Diff b) { return cmp(first[a], first[b]); });
 apply_permutation(first, last, indices.begin());
}

template<typename Iter>
void sort_minimize_copies(Iter first, Iter last)
{
    return sort_minimize_copies(first, last, std::less<>());
}

We use std::iterator_traits to tell us what to use to represent indices, then we create a vector of those indices. (The difference type is required to be an integral type, so we didn't have to play any funny games like first - first to get the null index. We could just write 0.)

We then sort the indices by using the indices to reference the original collection. (We also provide an overload that sorts by <.) This performs an indirect sort, where we are sorting the original collection, but doing so by mainpulating indices rather than the actual objects.

Once we have the indices we need, we can use the apply_permutation function to rearrange the original items according to the indices.

We'll wrap up next time with another kind of sorting.

Comments (4)
  1. Sebastian Johansson says:

    Well, that’s the first time I’ve been so quick to a post that the post itself didn’t exist when it showed up in the list.

  2. Billy O'Neal says:

    On 64 bit, you may want to consider using uint32_t/int32_t instead of difference_type, which is typically 64 bits (ptrdiff_t). Depends if greater than 4B/2B elements is sane for your application.

    1. kantos says:

      The only issue with that is you enter the “wonderful” world of narrowing conversions because the result of arithmetic operations on two random access iterators is always difference_type. While 32bit operations are technically faster on intel hardware (and yes use fewer opcode bytes); the difference between them is not significant unless you’re in a hot loop. Moreover on other platforms using narrowed types may even occur an overhead (Alpha AXP would require sign extending the 32 bit result then operating on it). As such I think the above solution is the best and safest portable solution.

      To quote STL: “Don’t help the compiler” in most cases those little ‘helps’ can be wrong and even hurt performance or portability.

  3. Joshua says:

    @Raymond:
    Like ildjarn mentioned yesterday, this line:
    using Diff = std::iterator_traits<Iter1>::difference_type;
    should be:
    using Diff = typename std::iterator_traits<Iter>::difference_type;
    … because Iter is a template parameter. Even though cl.exe doesn’t care about the typename in front, it’s required by Standard C++ and every other standard-compliant compiler will take issue with it. If you change your “Platform Toolset” to “Visual Studio 2015 – Clang with Microsoft CodeGen (v140_clang_c2)” in Visual Studio 2015, you’ll immediately see these type of errors as:
    error : missing 'typename' prior to dependent type name 'std::iterator_traits<Iter>::difference_type'

    Also, at the time that I’m writing this comment, your first template parameter is typename Iter, but you are using using Diff = std::iterator_traits<Iter1>::difference_type; in the body … so just a small typo … possibly a holdover from your last post.

Comments are closed.

Skip to main content