Note: while this is about an Active Directory database, Exchange is
based on the same database technology, so it would (and does) have similar space hierarchy

Table of ESE Space usage:

The black in this table is just the output from 3 of the columns of esentutl /ms adamntds.dit (original report), the blue are columns/rows I’ve added to break out the space usage in a clearer way:

Name Friendly name Owned Available Owned(GB) Avail(GB) b-tree lvls % of DB % of Total Idxs

<calc: DB real> 268179344 6088 0.000 0.046

268179344 218 2046.046 0.002

<calc: datatable real> 268178751 5689 2046.041 0.043

268178751 2260 2046.041 0.017

<calc: Row Data> 186707905 2260 1424.468 0.017 5 69.62%
<Long Values>

42 18 0.000 0.000 2 0.00%

<sum: Idx Totals> 81470804 3411 621.573 0.026 0 30.38% 100.00%
Int: PDNT + Name 11482791 7 87.607 0.000 5 4.28% 14.09%
Int: NC + objGuid 10870892 5 82.938 0.000 4 4.05% 13.34%
Att: objectGuid 10791866 3 82.335 0.000 4 4.02% 13.25%
Att: cn 9658638 51 73.690 0.000 4 3.60% 11.86%
Att: name 8999729 83 68.662 0.001 4 3.36% 11.05%
Int: Ancestry 7917036 10 60.402 0.000 4 2.95% 9.72%
Int: Repl USN 7083583 251 54.043 0.002 4 2.64% 8.69%
Att: objectCategory 5274627 21 40.242 0.000 4 1.97% 6.47%
Int: Repl Created USN 4479144 34 34.173 0.000 4 1.67% 5.50%
Att: uSNChanged 4279711 0 32.652 0.000 4 1.60% 5.25%
Int: deltime 371267 15 2.833 0.000 4 0.14% 0.46%
Att: isDeleted 261493 2929 1.995 0.022 4 0.10% 0.32%

… deleted about a dozen small indices …

I’ll discuss the permutations I performed on the esentutl /ms output, in the hopes it will be clear …

First I sum up the owned space for all indices in the datatable, this comes out to
Note the #’s above may not add up exactly because I deleted a dozen or
so super small indices.  I summed up all the indices because it
makes the next calculation easier, and also so we can get the “% of
Total Idxs” column as well.

So first understand that ESE’s “owned” space is hierarchical, so the “datatable” owns all the space owned by each of
the indices and the LV B-Tree in the datatable.  But the primary B-Tree for the datatable
also contains (and thus owns) the normal row data.  So the real data that is in the
regular row data for the datatable is
268178751 (datatable) – 42 (datatable’s LV B-Tree owned) – 81470804 (owned by sum of all datatable indices) = 186707905 (i.e. the “<calc: Row Data>” line).

I then created a couple columns to turn this page counts into a usable unit (GBs), i.e. <# of pages> * 8 / 1024 / 1024.

Finally I added a friendly name column, so you’d know roughly what the index was indexing.

Some analysis:

From the above table we can easily see the row data 1,424 GBs and all
the indices combined is 621 GBs.  This breaks out like so:

Based on the table above this is showing us a full 30% of this
database is indices!!!  That’s a huge amount.  This isn’t a
common space breakdown for most AD objects, as the objects making up
Eric’s DIT are very very small / light weight.  He was just
creating containers w/ minimal attributes (see Eric’s initial post),
and so just the base set of indices on a basic object lead up
to a significant portion of the objects overall “footprint” in the DB.

As for the breakup of the individual index usages, it looks something like this:

Of the secondary indices on the datatable,
10 are always updated!  And another 2 (the very slender ones) are
only updated on delete.  Since there are over 2 billion objects in
this database, that means we inserted about 22 billion B-Tree entries,
kind of neat.

One last, somewhat technical thing that I think a few of you might find
interesting, is that even the
largest 1,424 GB primary B-Tree is only 5 levels deep.  This means that to
locate a specific row (by DNT) will only take 5 disk seeks in the worst
case (cold cache).  B-Trees have this very nice high fan out, that keeps disk seeks minimal.

Interestingly, I dumped the root page, and it only has 3 nodes (TAG 0
doesn’t count), what this means is that we could add about 100x more
data to this b-tree and there would be no increase in the # of disk seeks to fetch a row
from this table.

Anyway that seems like enough for now …

Comments (17)

  1. Lee Flight says:

    Did you look at the performance hit of adding a new index on this data (assuming there was something to index on these lightweight objects)?


  2. BrettSh says:

    Unfortunately we were already over (time) budget by a week, so we weren’t able to try anything else.  We also wanted to see the rate of offline defrag, and a few other experiements.  We didn’t think of this idea, it is a good one, will keep it in mind next time we do a big test.


  4. Jim McBee says:

    Yep, that is one whopping big DIT.  Didn’t Compaq/HP create a massive Active Directory using W2K several years back that included every phone directory in the country?   Or was it the State of California.  I forget who did it, but they were trying to create the largest AD database that would feasibly ever be created.  

  7. lol says:

