I decided to check out the size of Eric's DIT ...

Article
06/12/2006

... take some time, measuring the exact dimensions of Eric's DIT
... and I must say, I've seen a fair amount of DITs in my time, and I can say with a
fair amount of certainty, Eric's DIT is the biggest I've ever seen! I can't believe what a massively huge a DIT Eric has.

Note: while this is about an Active Directory database, Exchange is
based on the same database technology, so it would (and does) have similar space hierarchy.

Table of ESE Space usage:

The black in this table is just the output from 3 of the columns of esentutl /ms adamntds.dit (original report), the blue are columns/rows I've added to break out the space usage in a clearer way:

Name	Friendly name	Owned	Available	Owned(GB)	Avail(GB)
	<calc: DB real>	268179344	6088	0.000	0.046
F:\DNT\adamntds.dit		268179344	218	2046.046	0.002

	<calc: datatable real>	268178751	5689	2046.041	0.043
datatable		268178751	2260	2046.041	0.017
	<calc: Row Data>	186707905	2260	1424.468	0.017
<Long Values>		42	18	0.000	0.000
	<sum: Idx Totals>	81470804	3411	621.573	0.026
PDNT_index	Int: PDNT + Name	11482791	7	87.607	0.000
nc_guid_Index	Int: NC + objGuid	10870892	5	82.938	0.000
INDEX_00090002	Att: objectGuid	10791866	3	82.335	0.000
INDEX_00000003	Att: cn	9658638	51	73.690	0.000
INDEX_00090001	Att: name	8999729	83	68.662	0.001
Ancestors_index	Int: Ancestry	7917036	10	60.402	0.000
DRA_USN_index	Int: Repl USN	7083583	251	54.043	0.002
INDEX_0009030E	Att: objectCategory	5274627	21	40.242	0.000
DRA_USN_CREATED_index	Int: Repl Created USN	4479144	34	34.173	0.000
INDEX_00020078	Att: uSNChanged	4279711	0	32.652	0.000
deltime_index	Int: deltime	371267	15	2.833	0.000
INDEX_00020030	Att: isDeleted	261493	2929	1.995	0.022

... deleted about a dozen small indices ...

I'll discuss the permutations I performed on the esentutl /ms output, in the hopes it will be clear ...

First I sum up the owned space for all indices in the datatable, this comes out to 81470804.
Note the #'s above may not add up exactly because I deleted a dozen or
so super small indices. I summed up all the indices because it
makes the next calculation easier, and also so we can get the "% of
Total Idxs" column as well.

So first understand that ESE's "owned" space is hierarchical, so the "datatable" owns all the space owned by each of
the indices and the LV B-Tree in the datatable. But the primary B-Tree for the datatable
also contains (and thus owns) the normal row data. So the real data that is in the
regular row data for the datatable is 268178751 (datatable) - 42 (datatable's LV B-Tree owned) - 81470804 (owned by sum of all datatable indices) = 186707905 (i.e. the "<calc: Row Data>" line).

I then created a couple columns to turn this page counts into a usable unit (GBs), i.e. <# of pages> * 8 / 1024 / 1024.

Finally I added a friendly name column, so you'd know roughly what the index was indexing.

Some analysis:

From the above table we can easily see the row data 1,424 GBs and all
the indices combined is 621 GBs. This breaks out like so:

Based on the table above this is showing us a full 30% of this
database is indices!!! That's a huge amount. This isn't a
common space breakdown for most AD objects, as the objects making up
Eric's DIT are very very small / light weight. He was just
creating containers w/ minimal attributes (see Eric's initial post),
and so just the base set of indices on a basic object lead up
to a significant portion of the objects overall "footprint" in the DB.

As for the breakup of the individual index usages, it looks something like this:

Of the secondary indices on the datatable,
10 are always updated! And another 2 (the very slender ones) are
only updated on delete. Since there are over 2 billion objects in
this database, that means we inserted about 22 billion B-Tree entries,
kind of neat.

One last, somewhat technical thing that I think a few of you might find interesting, is that even the
largest 1,424 GB primary B-Tree is only 5 levels deep. This means that to
locate a specific row (by DNT) will only take 5 disk seeks in the worst
case (cold cache). B-Trees have this very nice high fan out, that keeps disk seeks minimal.

Interestingly, I dumped the root page, and it only has 3 nodes (TAG 0
doesn't count), what this means is that we could add about 100x more
data to this b-tree and there would be no increase in the # of disk seeks to fetch a row
from this table.

Anyway that seems like enough for now ...

I decided to check out the size of Eric's DIT ...

Additional resources