SQL Server 2008 Data Collector Proof of Concept

In this blog post, I will go through the steps needed to prepare for a proof of concept (POC) for using the SQL Server 2008 R2 Data Collector (DC) and Management Data Warehouse (MDW) reports (covered in part 2) to get you up and running. I’ll cover the installation of the new Query Hash Stats collection set from Bart Duncan and how to get the MDW reports installed on your SQL Server Reporting Services (SSRS) server. My assumption for the POC is that you’ll have around 30 systems that you’ll want to monitor.

Download the Query Hash Statistics Collection Set

The Query Hash Statistics collection set was created by Bart Duncan of Microsoft. At the tail end of the SQL Server 2008 release cycle, the Manageability team ran out of time to incorporate the late changes that the Engine team made to support the “Query Fingerprint” and “SQL Fingerprint” features. See Bart Duncan’s blog post – “Query Fingerprints and Plan Fingerprints (The Best SQL 2008 Feature That You've Never Heard Of)” for a deep understanding of how this works as well.

What it this means for the Data Collector is that the collection of Query Plans and Text is highly optimized with the new Query Hash Statistics collection set. In addition, Bart has created a set of reports which offers more insight into the most expensive queries running on your system.

Download the Query Hash Statistics project and unzip the files to a shared location accusable by the computer hosting the MDW and the computers hosting the SQL Server database engines that you’ll collect data from.

Getting Started with Data Collection

I’m going to assume that you’ve already installed SQL Server 2008 or SQL Server 2008 R2 Database Engine and Management Tools. Optionally you can install Reporting Services for running the RDL files that I’ll provide.

The MSDN topic “Getting Started with the Data Collector”, goes over the basic steps for creating the Management Data Warehouse database and the Data Collector, but there a few things that you should know before you begin.

  • Plan on Centralized Collection. The MDW has the most value when the database is on server that is not used for your production workloads. The Data Collector is designed with this configuration in mind. This eliminates most of the “Heisenberg Observer Effect”. If you already have a SQL Server instance running System Center Operations Manager, I recommend putting the MDW database there as a single management host. The instance doesn’t need too much power, just plenty of disk space available.
  • Planning MDW Storage. For pessimistic planning, your MDW will grow 600 MB per day per instance for 14 days (the default retention period for the system collection sets). Once the retention period kicks in, data and collection logs will get purged at 2am local time for the centralized MDW.  You’re looking at 8.4 GB times 30 instances or 252 GB.
  • Creating the MDW via the Wizard. Now that you know the pessimistic size for the MDW, when you run the wizard use this value for the initial size of the MDW versus the default of 100MB. This way you don’t get a performance hit all the time in the MDW as it grows 10MB at a time. You should set the growth size to 10GB as well when new instances are added. You can leave the transaction file size as the default since this rarely grows in size.
  • Keep Security Simple. For the POC, you’ll want to use a single domain user account that has admin rights to the SQL Server instance and also admin rights to the machines that will be reporting data to the MDW. This is not the recommended final configuration, but makes the POC easier to configure. You can always go back to a lower privileged proxy account once you understand all of the moving parts.
  • Set up Data Collection. For each of the instances you want to collect data from, you’ll run the Configure management Data Warehouse Wizard. Once the wizard completes, go to the Query Statistics system collection set under the Data Collector node as shown below and issue the Stop Data Collection Set command.

02 Disable Query Statistics

  • Follow the Quick Start Guide for Query Hash Statistics. This page contains steps to configure the prerequisites, which you can skip since I’ve already covered them above.

After updating the MDW to support the new collection set, installing the collection set on the target instance, and copying the reports as instructed to the MDW server, you are ready to go.

Query Hash Statistics Reports

The reports provided for the SQL Server Query Hash Statistics collection set replace the Query Statistics reports. To open the reports, connect SSMS to the SQL Server instance that hosts your MDW database, right-click on the MDW database in Object Explorer, and select "Custom Report..." from the Reports submenu. Browse to the location where you extracted the Query Hash Statistics files, and open MdwOverviewCustom.rdl. You will not need to manually browse to the .RDL file after this; you'll find the report name in the Reports context menu. The first time you open a custom report, SSMS will ask you to confirm that you trust the report.

You’ll want to wait a few hours after collecting data to run the reports so that you can start making sense of the data.

It’s that easy. In the next blog post, I’ll cover how to install the MDW reports on an SSRS server.

Technorati Tags: Data Collector,MDW,Query Hash Statistics,Proof of Concepts,SQL Server 2008 R2,SQL Server 2008

Be sure to follow me on Twitter at @billramo