Determine the True amount of WhiteSpace in an Exchange Database

So we come across this from time to time so I thought it would be interesting to talk about… besides I needed something to show that I still cared about my blog.

So the question is…

“If I add up all the space used by mailboxes, why doesn’t roughly match the size of the database? And what does that number represent in the 1221 event I receive?”

First looking at the 1221 event:

       Event: 1221

       Source: MSExchangeIS Public

       Type: Information

       Category: General

       Description:

     The database "<storage_group>\<mailbox_store> (<server_name>)" has

     <nnn> megabytes of free space after online defragmentation has

       terminated.

The number presented is derived from the number of free pages that are available at the database root, the messages table, folders, and the attachments table. Statistically over the years we know these tables represent nearly 90% of the space used in the database.

To understand the reasoning behind this, a quicker primer on space management in the database is needed. Here are the some aspects to keep in mind:

· The fundamental space allocation is a 4k page.

· A series of 16 consecutive pages is the amount of space that is allocated to a table when space is requested. (not just one page at a time)

· A page cannot store data from two different tables or belong from two different trees.

There are two levels of space management that occurs in the database. One is at the global database level (space available to all tables in the database store in ranges of 16 consecutive pages) and at the B+tree level. If you are unfamiliar with the concepts of B+Trees, it may be easier to think of them as a Table. Tables maintain the state of pages they own and do not free up pages to the overall database until 16 consecutive block of pages is freed up. The reason for this is efficiency… by reusing pages that are physically adjacent to existing pages we minimize the effort necessary to perform read and write operations. So by holding onto pages we ensure that a larger majority of pages are physically adjacent.

In a database, we could literally have thousands of tables (at least one for each folder in every mailbox). As I mentioned, we know statistically the messages and attachment tables represent 90% of the space used in the database. Likewise they would have the also have the greatest percentage of free space, or commonly referred to as white space in the database.

In order to arrive at a calculation for the amount of available space an individual table possesses, we have to a lot of work interrogating the space usage information of the table. If we were to do this for thousands of tables, it would take incredible amount of time to perform this overall calculation. Given the greatest majority of space statistically exists at the root, in the messages table, and attachments table and the amount of time it takes to calculate space, we only take consideration this information when logging the 1221 event.

So there is also a lot of other stuff hidden away in the database that mailboxes are not charged for. For example there are search folders, indexes, and system tables that allow Exchange to operate. In addition, we have already stated that a page cannot belong to multiple tables, so if a page does have free space, it can only store new records for that table and further more, it will only contain records meet the criteria to be stored on that page. If you stored every record beginning with the letter “A” on one page and every record beginning with “B” on a second page, there is no reason a record beginning with “A” should appear on the page containing all the “B” values. Therefore there can be space available on a individual page that also isn’t taken take into consideration when calculating the 1221 event since we only add up the empty/free pages in the messages and attachment tables.

Online Defragmentation takes aim at “compressing” records to fewer pages in a particular table. For example, if a table owned 100 pages, each half full, online defrag will attempt to free approximately half of the pages. When 16 contiguous pages are freed, the space is released back to the database itself to be used by other tables in the database. It is only the pages that are freed up in the messages and attachments tables that eventually get reported.

If you want to take a more in-depth look at the space used in the database, then a space dump using ESEUTIL is necessary. For example (note the output is truncated for brevity):

 

 

C:\Program Files\Exchsrvr\MDBDATA>..\bin\eseutil /ms priv1.edb

Microsoft(R) Exchange Server Database Utilities

Version 6.5

Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode...

         Database: priv1.edb

****************************** SLV SPACE DUMP ******************************

Chunk Free Res Del Com |------------ Used ------------|

============================================================================

512 110 0 0 402 *************************

1024 0 512 0 0 ********************************

1536 512 0 0 0

2048 512 0 0 0

2560 512 0 0 0

3072 512 0 0 0

3584 512 0 0 0

4096 512 0 0 0

4608 118 0 0 394 ************************

5120 0 0 0 512 ********************************

5632 0 0 0 512 ********************************

6144 0 0 0 512 ********************************

6656 0 0 0 512 ********************************

7168 0 0 0 512 ********************************

7680 0 0 0 512 ********************************

8192 238 0 0 274 *****************

============================================================================

TOTALS:

         Free: 3538

     Reserved: 512

  Deleted: 0

    Committed: 4142

      Unknown: 0

              -------------

                       8192

****************************************************************************

******************************** SPACE DUMP ***********************************

Name Type ObjidFDP PgnoFDP PriExt Owned Available

===============================================================================

priv1.edb Db 1 1 256-m 8960 116

<SLV Avail Map> SLV 6 33 32-m 32 29

<SLV Owner Map> SLV 7 65 32-m 80 3

1-122 Tbl 75 301 8-s 8 3

  MsgFolderIndex7 Idx 77 302 1-s 1 0

  MsgFolderIndexPtagDel Idx 80 305 1-s 1 0

  MsgFolderIndexURLComp Idx 79 304 1-s 1 0

  RuleMsgFolderIndex Idx 78 303 1-s 1 0

1-23 Tbl 61 236 2-m 7778 16

  <Long Values> LV 222 237 1-m 7691 8

1-24 Tbl 63 257 8-s 8 3

  MsgFolderIndex7 Idx 65 258 1-s 1 0

  MsgFolderIndexPtagDel Idx 68 261 1-s 1 0

  MsgFolderIndexURLComp Idx 67 260 1-s 1 0

  RuleMsgFolderIndex Idx 66 259 1-s 1 0

1-33 Tbl 336 8821 8-m 14 1

  <Long Values> LV 342 8826 1-m 5 2

  ?T668f-T6654+Q3f88 Idx 354 8827 1-s 1 0

  MsgFolderIndex7 Idx 338 8822 1-s 1 0

  MsgFolderIndexPtagDel Idx 341 8825 1-s 1 0

  MsgFolderIndexURLComp Idx 340 8824 1-s 1 0

  RuleMsgFolderIndex Idx 339 8823 1-s 1 0

Folders Tbl 8 97 9-m 100 10

  <Long Values> LV 105 243 1-s 1 0

  *T668f+Q6749+S3001+Q6 Idx 60 232 1-s 1 0

  ?T668f+Q6749+S3001+Q6 Idx 59 109 1-m 9 0

  Folders Fid to Pfid I Idx 17 108 1-m 9 2

  FoldersIndex10 Idx 13 102 1-s 1 0

  FoldersIndex13 Idx 14 103 1-s 1 0

  FoldersIndex5 Idx 9 98 1-m 5 1

  FoldersIndex6 Idx 10 99 1-s 1 0

  FoldersIndex7 Idx 11 100 1-m 7 0

  FoldersIndex8 Idx 12 101 1-s 1 0

  Hashed URL Name Index Idx 15 104 1-m 5 0

  ScopeFIDs DeleteTime Idx 16 105 1-s 1 0

Mailbox Tbl 21 140 2-m 10 2

  MailboxIndex2 Idx 22 141 1-s 1 0

  MailboxIndex3 Idx 23 160 1-s 1 0

Msg Tbl 19 112 2-m 194 63

  <Long Values> LV 106 359 1-m 14 0

-------------------------------------------------------------------------------

                                                                            253

First Lets take a look at this row. There is two columns, the owned and available. The owned value is the number of pages in the database that contains *some* data. The next value, available, represents the number of free pages available at the database root level that can be distributed to tables as they need space to grow.

******************************** SPACE DUMP ***********************************

Name Type ObjidFDP PgnoFDP PriExt Owned Available

===============================================================================

priv1.edb Db 1 1 256-m 8960 116

Next lets look at the attachments table. You see below two rows, one called 1-23 which is the primary B+tree and then a <Long Values> which contains records or fragments or records that are too large to be stored in the primary table are stored. The value under the owned column for 1-23 (7778) represents the total number of pages owned by this table. This encompasses the long values and any other secondary indexes that could be in use by this table. The next value is the available pages (16) that can be reused by this individual table (but not by indexes or Long Value tree). Of the 7778 pages in use by this table, the long value tree is occupying 7691 of them and has 8 pages available.

******************************** SPACE DUMP ***********************************

Name Type ObjidFDP PgnoFDP PriExt Owned Available

===============================================================================

1-23 Tbl 61 236 2-m 7778 16

  <Long Values> LV 222 237 1-m 7691 8

Next lets take a peek at a standard folder in use by someone in the database. The folder is 1-33. (Internally we represent tables as numbers.) This would typically represent someone’s INBOX folder or some other folder the user may have created. Here we have a primary entry, 1-33, that is the table itself. Listed under this row are associated Long Value and indexes to this table. As you can see, overall this table occupies 14 pages with 1 page available to the primary table. The Long Values owns 5 of the total 14 pages owned by this table and has 2 free pages while the table entry only has one. The thing to keep in mind is the Owned value represents ALL pages in use by the table, indexes and LV trees and the Available only represents the number of pages available to that individual index/LV/Table and is not cumulative.

******************************** SPACE DUMP ***********************************

Name Type ObjidFDP PgnoFDP PriExt Owned Available

===============================================================================

1-33 Tbl 336 8821 8-m 14 1

  <Long Values> LV 342 8826 1-m 5 2

  ?T668f-T6654+Q3f88 Idx 354 8827 1-s 1 0

  MsgFolderIndex7 Idx 338 8822 1-s 1 0

  MsgFolderIndexPtagDel Idx 341 8825 1-s 1 0

  MsgFolderIndexURLComp Idx 340 8824 1-s 1 0

  RuleMsgFolderIndex Idx 339 8823 1-s 1 0

At the end of the dump is the a line similar to the following. It contains a summation of the total number of pages that are available throughout all the tables. By taking this value and multiplying it by 4096, you arrive at the true amount of whitespace in the database.

-------------------------------------------------------------------------------

                                                                            253

So the summation of all the mailboxes do match because the size of the database because while only 100k of information total is stored in a mailbox, the tables that actually store the data could occupy a larger amount of space depending upon the density of the pages at the time and if pages have been freed up to the database root. I am not saying this is the answer to every discrepancy, but in the majority of cases this has proven to be the case.

I hope this was informative and if there is something I need to drill down on further, please let me know in your feedback.