PreScan: SQL Database Schema Distribution

Article
11/14/2010

At the very start of a performance tuning exercise I do a Quick Appraisal to assess what I'm dealing with. This is one of a number of “information collection” scripts I run.
This post shows (a) Code & (b) some ideas on insight you can get from it.
What are PreScans?
When someone has a performance problem they often expect you to "know” everything about their system & pinpoint the problems almost before you’ve even seen their system. To bring me up to speed fast, I run a series of trivial light weight queries that give me some statistics & a rough idea of what I’m dealing with. I run these Database scans prior to starting a job, hence the term “Pre-Scan” or “Quick Appraisal” phase.
As I mentioned my “Quick Appraisal” set of queries in a recent presentation to the Sydney SQL Server User Group. I thought I’d post a few on my blog. This is one of them. Note, this is not an in-depth anything, often a 5 sec glance at the results is sufficient. But after looking thru 10-15 of these simple queries, it is amazing how accurate your “first impressions” will be for speculating on the potential problem areas of a system.
Quick Appraisal
There are many different styles of database design. Looking at the number of Objects, the types of Objects & ratios between them, often permits you to make a quick assessment on both the complexity of the database as well as the mindset of those who created it. Knowing this "mindset" permits you assess the probability of where their poor coding practices might lie.
Is it unfair to assume there are suboptimal coding practices in the database? No, no-one will every pay you to tune a system that they think is working well. There is bound to be a bottleneck somewhere. Most organisations will have already tried “throw more H/W at it” ;-)
How does it work?
The above table shows output from 7 different databases. Debtrak, O4, Spinn are production systems used by very large companies. SSRS, MSDB, Master are SQL System Databases. AW08 is the SQL Sample Adventureworks database.
You will note, I’ve created a few extra rows to look at the Primary keys, Foreign Keys & Stored Procs as a percentage of overall tables.
Debtrak: High Use of Stored Procedures (Could be a good thing, unless they are full of cursors). Fairly high number of tables. 1/3 of the tables have no primary key. Almost no foreign keys. But heaps of Stored Procedures, perhaps they do data integrity checks there. If not, could be heaps of duplicate rows & dirty data. Using functions so schema has been developed for SQL2005+ or modified to use newer features introduced in SQL 2005. No CLR, Extended SP’s, or Service Broker, generally DBA’s who implement these in their design are highly skilled in both DBA & Dev. So less likely to find any easy wins.
O4: Lots of Tables & views. Need to check on the PK use, only 8% don’t have them. Very few FK’s so Declarative Referential Integrity (DRI) is not used much, probably done with rules. Lots of Rules, database likely to created on pre-SQL2000 & upgraded. Watch for depreciated items, especially Text & image fields & functions that trim the time off dates. Quite a few triggers, need to check how complex they are.
Spinn: Lots of FK’s (so extremely heavy use of DRI = good), No views or Stored procs. Probably created by a programmer, where all data integrity would be handled by their application. Their app will generate TSQL. This would be a nightmare to tune unless they have Unit & Load tests created for their code. At best you will be creating indexes or playing with H/W. I’d expect to find a lot of dirty data as very few applications do a good job at checking anything other than the datatypes & a drop down list of values. The relations between the fields in a row are often not checked at all eg: This results in Pregnant males, Customers born in the future, Orders delivered before they were placed & 32 year old Primary School students. At least the relationships between rows should be good.
SSRS (SQL Reporting Services): Good use of Stored Procs & FK’s. Very few tables. Not a lot to look at, easy to understand.
AdventureWorks 2008: Modest number of tables. DRI is brilliant, data likely to be of good quality, especially when you look at the number of Rules & Check constraints. Few stored procs, so app likely to be passing TSQL. Rules also suggest an older design, so look for depreciated constructs.
Master: The numbers suggest there are many things wrong with this database. The only good news is that a tiny number of tables suggest it is not used by more than 1 app. Of course in this case we know that since SQL2005 the Master DB was split in two. The other part is the resource database, which is not shown here. So it makes little sense drawing any conclusion about it other than “Hey it is a small number of tables, should be easy to figure out what they do.”
Sample of TSQL for this type of Pre-Scan
Version 1. - Faster, Sort in Order you want.
Only shows objects that exist. It will also display new Objects Types if they are introduced in future versions of SQL
USE O4Jul09
go
;WITH ObjectTypes AS (
SELECT [type], type_desc, COUNT(*) AS [Count]
FROM sys.objects
GROUP BY [type], type_desc
)
SELECT *
FROM ObjectTypes
ORDER BY CASE [type]
WHEN 'SN' THEN 10 -- Synonym
WHEN 'V' THEN 11 -- View
WHEN 'U' THEN 20 -- User Table
WHEN 'PK' THEN 21 -- PRIMARY KEY constraint
WHEN 'F' THEN 22 -- FOREIGN KEY constraint
WHEN 'UQ' THEN 23 -- UNIQUE constraint
WHEN 'C' THEN 24 -- CHECK constraint
WHEN 'D' THEN 25 -- DEFAULT (constraint or stand-alone)
WHEN 'R' THEN 26 -- Rule (old-style, stand-alone)
WHEN 'TR' THEN 30 -- SQL Trigger
WHEN 'SQ' THEN 31 -- Service queue
WHEN 'P' THEN 40 -- SQL Stored Procedure
WHEN 'TF' THEN 41 -- SQL Table-valued-function
WHEN 'IF' THEN 42 -- SQL Inline Table-valued function
WHEN 'FN' THEN 43 -- SQL Scalar function
WHEN 'X' THEN 50 -- Extended stored procedure
WHEN 'RF' THEN 51 -- Replication-filter-procedure
WHEN 'PC' THEN 60 -- Assembly (CLR) stored-procedure
WHEN 'FT' THEN 61 -- Assembly (CLR) table-valued function
WHEN 'FS' THEN 62 -- Assembly (CLR) scalar-function
WHEN 'AF' THEN 63 -- Aggregate function (CLR)
WHEN 'TA' THEN 64 -- Assembly (CLR) DML trigger
WHEN 'PG' THEN 80 -- Plan guide
WHEN 'IT' THEN 90 -- Internal table
WHEN 'S' THEN 91 -- System table
WHEN 'TT' THEN 92 -- TYPE_TABLE
ELSE 100
END
go
Version 2. - Slightly Slower. Highlights which object types the schema doesn't use.
Only shows objects that exist
USE O4Jul09
go
;WITH ObjectTypes AS (
SELECT [type], type_desc, COUNT(*) AS [Count]
FROM sys.objects
GROUP BY [type], type_desc
)
, ObjectGroups AS (
SELECT 'SN' AS [type] , 10 AS SortOrder, 'Synonym' AS type_desc, 'Any Redirection to be aware of?' AS Action UNION ALL
SELECT 'V' , 11 , 'View', 'Does it make extensive use of Views?' UNION ALL
SELECT NULL, 19, '.', '.' UNION ALL
--< User Tables >--
SELECT 'U' , 20, 'User TABLE', 'Amount of tables' UNION ALL
SELECT 'PK', 21, 'PRIMARY KEY constraint', '% of Tables with PK''s defined' UNION ALL
SELECT 'F' , 22, 'FOREIGN KEY constraint', 'Are they using DRI?' UNION ALL
SELECT 'UQ', 23, 'UNIQUE constraint', 'Check M:M join tables are correctly protected' UNION ALL
SELECT 'C' , 24, 'CHECK constraint', 'Are Data values protected by schema?' UNION ALL
SELECT 'D' , 25, 'DEFAULT (constraint or stand-alone)', 'Do they rely on Defaults or SPs or nothing' UNION ALL
SELECT 'R' , 25, 'RULE (old-style, stand-alone)', 'Are they using Depreciated Checks' UNION ALL
SELECT NULL, 29, '.', '.' UNION ALL
--
SELECT 'TR', 30, 'TRIGGER', 'Look at ratio vs User Tables. Heavy / Light use of Triggers?' UNION ALL
--< SQL Broker >--
SELECT 'SQ', 31, 'Service QUEUE', 'Are they using Async "Triggers" or SQL Broker?' UNION ALL
SELECT NULL, 39, '.', '.' UNION ALL
--< Procedural Code >--
SELECT 'P' , 40, 'Stored Procedure', 'Number & Ratio to User Tables' UNION ALL
SELECT 'TF', 41, 'Table-Valued-function', 'Are they being used?' UNION ALL
SELECT 'IF', 42, 'Inline Table-Valued function', 'Are they being used?' UNION ALL
SELECT 'FN', 43, 'Scalar function', 'Are they being used?' UNION ALL
SELECT NULL, 49, '.', '.' UNION ALL
--
SELECT 'X', 50, 'Extended stored procedure', 'Need to think about security' UNION ALL
SELECT 'RF', 51, 'Replication-filter-procedure', 'Will need to think about external connections' UNION ALL
SELECT NULL, 59, '.', '.' UNION ALL
--< CLR Use >--
SELECT 'PC' , 60, 'Assembly (CLR) stored-procedure', 'Are they useing CLR?' UNION ALL
SELECT 'FT', 61, 'Assembly (CLR) table-valued function', 'Are they useing CLR?' UNION ALL
SELECT 'FS', 62, 'Assembly (CLR) scalar-function', 'Are they useing CLR?' UNION ALL
SELECT 'AF', 63, 'Aggregate function (CLR)', 'Are they useing CLR?' UNION ALL
SELECT 'TA', 64, 'Assembly (CLR) DML trigger', 'Are they useing CLR?' UNION ALL
SELECT NULL, 69, '.', '.' UNION ALL
SELECT NULL, 79, '.', '.' UNION ALL
--< Other >--
SELECT 'PG', 80, 'Plan guide', 'Some serious compatibility work has been done on this DB' UNION ALL
SELECT NULL, 89, '.', '.' UNION ALL
--
SELECT 'IT', 90, 'INTERNAL_TABLE', '-' UNION ALL
SELECT 'S' , 91, 'SYSTEM_TABLE', '-' UNION ALL
SELECT 'TT', 92, 'TYPE_TABLE', '-'
)
SELECT COALESCE(G.[type], T.[type], '.') AS Type, COALESCE(G.type_desc, T.type_desc) AS [Type Description], ISNULL(CAST(T.[Count] AS VARCHAR(8)), '.') AS Quantity, G.Action
FROM ObjectGroups AS G
FULL JOIN ObjectTypes AS T ON T.[type] = G.[type]
ORDER BY SortOrder

Examples of insight you can gain from this information.

User Tables

The more they have the more complex the schema. Obvious huh?

Primary Keys

Generally every table should have one. If they don't there should be a really good reason. Find out what it is. More often I hear "Oh! Really", "Yer well I was in a hurry that day & was going to get back to it." or "That was when I was new" or “What’s a Primary key?”.
Sometimes are good reasons eg: these are “Staging Tables” where data is bulk loaded into them, manipulated with Table scans & moved on. This supports an Extract, Load & Transform (ELT) style of data movement. (NB: Most people still refer to this as Extract, Transform & Load (ETL) processing). You may think that this activity may be better handled in TempDB, but sometimes a complex series of batch load processes need; i)a way to pass data from one to the other or ii) die & be restarted without doing everything from scratch.

Foreign Keys

Are they using Declarative Referential Integrity (DRI)? If not, Why not?

Perhaps this is a Data warehouse & the relationships were validated in the staging database or the ETL process.

Perhaps they are relying on Stored Procs & Triggers - fragile & possibly slow.

You should ensure that FK Constraints have a useful index to enforce them. Without it, queries can deadlock on themselves &/or cause table scans even on single row modifications. Note: That does not mean you should automatically create a Non-Clustered index on just that specific column. Perhaps adding another column as a composite key or as an Included field will permit the index to cover the needs of a commonly run query as well.

Unique Constraints

These may be unnecessary if the Primary Key fields were selected correctly but if there are none, it raises an alarm bell.
- Look for tables designed to support Many to Many Joins. There is a high probability they've not been indexed / constrained correctly. Resulting in duplicate rows & highly inefficient queries.
- Most databases I've seen suffer from duplicate rows. Especially ones with Web front ends that don't prevent the users from hitting OK multiple times when the interface fails to respond quickly. The only difference in the rows is the Auto-generated Primary key. A Unique constraint can prevent that but effectively you are creating the entire table again in the index. So use with care. It is probably better to make the two foreign Key columns a composite Primary key & remove the Identity column you are currently using as a primary key.

Check Constraints

The use of Check Constraints tends to suggest the database was designed by a DBA rather than a programmer. Programmers tend to validate the input in Stored Procedures &/or their application. (NB: I did not say one approach was always better than the other. )
Check Constraints are vital to get all the performance benefits from a Partitioned table.

Default Constraints

Very few or zero Defaults, suggests they are setting these values in Stored Proc's or in the App code.
Be alert, DB's with high numbers of Default, often use the poor practise of Insert a row & then immediately Select it to determine values of the Defaults & Identity fields.

This often results in an index search to Update/Insert a row & another to select it. Also resulting in 2 network round trips, so more network traffic.

If you need to return the values generated by your Insert ie: set by Identity, Default, & triggers. Use an OUTPUT Clause on your insert, instead of re-Selecting the rows you've just inserted.

-- < Inefficient 2 Separate queries, 2 Network round trips >--

INSERT INTO InsertOutput(name) VALUES('Dave11')

SELECT * FROM InsertOutput WHERE name = 'Dave11'

-- < More efficient 1 query, 1 Network trip, ~33+% faster >--

INSERT INTO InsertOutput(name)

OUTPUT inserted.ID, inserted.Name, inserted.DateIn, inserted.UserName

VALUES ('Dave21')

Rules

If you find any, make time to rewrite them as CHECK Constraints.
I really liked the old style Rules, Defaults & User Defined Types as they could be created once & use in many tables. Thus ensuring consistency throughout the design. Unfortunately they are not ANSI compliant & are depreciated. Also you needed to be sure of your decisions when you created them as changing them caused a maintenance nightmare.

Triggers

Debugging Triggers &/or assessing their impact on performance can be very tricky. Knowing that Triggers exist in the design helps you factor that into your plans. Be aware that neither sp_depends nor the dependency DMV's do a perfect job at looking at objects called by triggers.
Look for INSTEAD OF triggers they may completely change the impact of what you thought DML statements would do to that table.
Where Triggers are used to maintain other entities see if you can rewrite it using Queues. Eg: If placing a Customer order causes a trigger to fire that initiates a workflow to restock your warehouse if the stock level is low. This activity is often better handled asynchronously using SQL broker. One benefit is that your end user will get a more predictable response time when their Order placement transaction is not suddenly burdened with the overhead of restocking the warehouse.

Service Queues

If they are not being used, it is likely that they could be beneficial .

Stored Procedures

Look at the number compared to User Tables.
If there are 3 SP's for each table it is likely a poor design. An Insert, an Update & a Delete stored procedure, is unlikely to encapsulate logic very well.
If there are very few SP's it is likely to become a maintenance nightmare. Without the SP's to abstract the Database schema from the application, all the queries will be created & generated in code. Making changes to the schema will be almost impossible to test. (Unless they have 100% code coverage in their Unit Tests & are not using Mocking to prevent the tests actually calling the DB. This is very rare)

SQL Functions

When used carefully these can improve code reuse & improve performance. Most often these create big perf issues. Watch for :-

- Calling a trivial function from within a WHERE clause, if may prevent you using an index & result in a table scan.
- Calling a Function from a Select list. It gets called once for each row in the resultset, which could be a huge perf hit. Often it is better to rewrite the Function as part of the Join clause.

Extended Stored Procedures

Check security is done correctly.
Check if they are necessary. Perhaps it is only there to support 1 batch process which might be better done via SQLCMD, Powershell or SSIS Workflow.

Replication Filter Procedures

This gives you a hint that you need to think about the replication system & interop with other SQL databases.

CLR Objects

Lets you know if they are being used.
They can be extremely powerful especially to enhance string manipulation.
These are not used by many people. Typically they are only suggested by DBA's with an extensive knowledge of the entire feature set of SQL Server & fairly good programming skills. So generally the DB design will be sound, there will be no easy wins & you will need to work hard to prove your value.

Plan Guides

Again these are an advanced feature. Anyone who created them would need to be extremely familiar with query plan optimisation. Thus you will need to think hard about any changes you make.

As always I’m interested in any thoughts & suggestions you may have

Dave

Random Thought for the post:

“Look before you leap” might be good advice. But it is unlikely to be a great legal defence if police catch you hanging off the roof, staring into the change room window at the local public pool.

PreScan: SQL Database Schema Distribution

What are PreScans?

Quick Appraisal

How does it work?

Sample of TSQL for this type of Pre-Scan

Examples of insight you can gain from this information.

Additional resources