Compile-time Binary Search

 

A while back I was writing a generic b-tree implementation that used a memory-mapped file for persisting / retrieving data. I wanted it to be used in the same way that std::map is, like this:

 

      btree<key_t, value_t> b_tree;

 

There are a bunch of other implementation details around specifying the file path for the backing store, etc. But, those aren’t particularly important. The challenge I faced was figuring out how many elements I could store in a single node in the b-tree (if you need a refresher on b-trees, read the Wikipedia article).

Initially, the problem seems pretty simple. Basically, there is some bookkeeping data, e.g. parent & child references (not actually pointers, since this is an on-disk structure), that are used for maintaining the b-tree’s structure and then there is an array of elements that use up the rest of the space in the node. It looks something like this:

template <class key_t, class record_t, rec_count_t records_per_node>

class node

{

      // bookkeeping data

      mmap_file<>* file_;

      node_index_t this_node_;

      node_index_t parent_node_;

      node_index_t larger_node_;

      rec_count_t record_count_;

      // element array

      entry_t entries_[records_per_node_];

};

 

So, all you have to do is figure out how much room that bookkeeping data takes up, subtract it from the total size of a node and use the remaining space for the element array. Simple, right?

Well, it turns out to be slightly more complicated than that. When you declare classes in C++, the compiler is given leeway on how it lays out the individual members of the class. It is allowed to rearrange them, add padding to properly align the items in memory, etc. So, just knowing the sizes of the individual members does not tell you how much space they will actually take up.

I came up with many different ways to attempt to calculate how much free space was in the node. However, every technique I used was in some way making assumptions about how the compiler would lay out the class.

I even considered using ‘#pragma pack’ to force the compiler to force the compiler not to put in any padding. However, this was a bad idea, since this is a generic data structure and I don’t know what type of data will be stored in it. Having a huge array of misaligned elements could be catastrophic from a performance standpoint.

At this point, I figured there had to be a simpler way. In reality, what I was trying to calculate was pretty simple. I just needed my class to fit into a node and stored as many elements as possible. All the information needed for making this decision is available at compile time.

That’s when it dawned on me that this was a perfect situation for template metaprogramming. If you haven’t used this programming paradigm before, I highly recommend reading Andre Alexandrescu ‘s excellent book, Modern C++ Design. Once you’ve read that, you should get familiar with Boost’s MPL. It is one of the most interesting libraries I have ever looked at. It takes the STL concepts of containers, iterators, functors, etc. and applies them to metaprogramming. Very interesting, indeed! J

OK, so I know that I want to be able to calculate at compile time how many elements can fit in a node. But, how do I accomplish that? Actually, it is conceptually pretty simple. The sizeof() operator can be used at compile time to calculate the size of a particular type. So, all I need to do is find the largest number of elements where sizeof(node<key_t, record_t, records_per_node>) is greater than or equal to the maximum node size. The pseudo-code looks something like this:

// keep incrementing until it is just a bit too big

num_recs = 0;

while(sizeof(node<key_t,record_t,++num_recs>)<=max_size);

// the previous one was the right size

--num_recs;

Now, in template metaprogramming, there are no loops. So, just like in functional languages, we have to use recursion, instead.

I implemented the above algorithm using compile-time recursion. When I did, I found two things:

1. It was slow. It turns out that the minimum page size for memory-mapped files is 64KB. That means that, for small record sizes, thousands of records fit in a single node. For these cases, I noticed a considerable delay during compilation while this calculation was occurring.

2. It caused the complier to crash with a stack overflow! As you learn when you start doing template metaprogramming, you need to be careful about how much recursion you do, since the compiler’s stack is limited.

There are lots of techniques for reducing the amount of recursion used during compilation. In my case, I decided to change my algorithm. Instead of doing a linear scan of sizes, I decided to implement a binary search. I will not go into the specifics of implementing a binary search, since most people should be familiar with it.

The resulting code was somewhat difficult to read. But, when writing a generic class that you expect to be used over-and-over again, you often sacrifice beauty for efficiency. Overall, I think the code makes a lot of sense. The only real confusing part of it is the use of eval_if_c<> to prevent the compiler from going into an infinite recursion. Hopefully, with a little bit of inspection, the technique I used becomes apparent. If anyone has any questions about how it works, please post questions and I will answer them.

To use this class (see below), I just do the following:

const int num_recs = node_max_sizer<
key_t,record_t,max_size >::records_per_node_;

 

num_recs now contains the maximum number of elements that can fit in a node. Here is the entire class:

#include <boost/mpl/identity.hpp>

#include <boost/mpl/comparison.hpp>

 

using namespace boost::mpl;

// performs a binary search at compile time to find the largest number of records that fit

template< class key_t,

class record_t,

int max_size,

int records_per_node = max_size/2,

int step = max_size/2,

bool found = false>

struct node_max_sizer

{

      typedef impl_::node<key_t,record_t,records_per_node> this_t;

      // cut the step size in half each time (can't be smaller than 1)

      static const int step_ =

if_c<step<=1, int_<1>, int_<step/2> >::type::value;

      // calc the size of this and the next block

      static const int size_ = sizeof(this_t);

      static const int next_size_ =

sizeof(impl_::node<key_t,record_t,records_per_node+1>);

      // if we are at the largest size that fits, we have found what we're looking for

      static const bool found_ = (size_<=max_size) && (next_size_>max_size);

      //

      typedef typename eval_if_c<found_,

                        this_t,

                        eval_if_c<(size_<max_size),

                              typename identity<node_max_sizer<key_t,record_t,max_size,records_per_node+step_, step_, found_> >::type ,

                              typename identity<node_max_sizer<key_t,record_t,max_size,records_per_node-step_, step_, found_> >::type

                              >

                        > type;

      static const int records_per_node_ = type::records_per_node_;

};