PreScan: SQL Database Schema Distribution

 

  1. At the very start of a performance tuning exercise I do a Quick Appraisal to assess what I'm dealing with.  This is one of a number of “information collection” scripts I run.

  2. This post shows (a) Code & (b) some ideas on insight you can get from it.

  3.  

  4. What are PreScans?

  5. When someone has a performance problem they often expect you to "know” everything about their system & pinpoint the problems almost before you’ve even seen their system. To bring me up to speed fast, I run a series of trivial light weight queries that give me some statistics & a rough idea of what I’m dealing with. I run these Database scans prior to starting a job, hence the term “Pre-Scan” or “Quick Appraisal” phase.

  6.  

  7. As I mentioned my “Quick Appraisal” set of queries in a recent presentation to the Sydney SQL Server User Group. I thought I’d post a few on my blog.  This is one of them. Note, this is not an in-depth anything, often a 5 sec glance at the results is sufficient. But after looking thru 10-15 of these simple queries, it is amazing how accurate your “first impressions” will be for speculating on the potential problem areas of a system.

  8.  

  9. Quick Appraisal

  10. There are many different styles of database design.  Looking at the number of Objects, the types of Objects & ratios between them, often permits you to make a quick assessment on both the complexity of the database as well as the mindset of those who created it. Knowing this "mindset" permits you assess the probability of where their poor coding practices might lie. 

  11. Is it unfair to assume there are suboptimal coding practices in the database? No, no-one  will every pay you to tune a system that they think is working well. There is bound to be a bottleneck somewhere. Most organisations will have already tried “throw more H/W at it”  ;-)

  12.  

  13.  

  14. How does it work?

  15.  

  16.  

  17. The above table shows output from 7 different databases. Debtrak, O4, Spinn are production systems used by very large companies. SSRS, MSDB, Master are SQL System Databases. AW08 is the SQL Sample Adventureworks database.

  18.  

  19. You will note, I’ve created a few extra rows to look at the Primary keys, Foreign Keys & Stored Procs as a percentage of overall tables.

  20.  

  21. Debtrak: High Use of Stored Procedures (Could be a good thing, unless they are full of cursors). Fairly high number of tables. 1/3 of the tables have no primary key. Almost no foreign keys. But heaps of Stored Procedures, perhaps they do data integrity checks there. If not, could be heaps of duplicate rows & dirty data. Using functions so schema has been developed for SQL2005+ or modified to use newer features introduced in SQL 2005. No CLR, Extended SP’s, or Service Broker, generally DBA’s who implement these in their design are highly skilled in both DBA & Dev. So less likely to find any easy wins.

  22.  

  23. O4: Lots of Tables & views. Need to check on the PK use, only 8% don’t have them. Very few FK’s so Declarative Referential Integrity (DRI) is not used much, probably done with rules. Lots of Rules, database likely to created on pre-SQL2000 & upgraded. Watch for depreciated items, especially Text & image fields & functions that trim the time off dates. Quite a few triggers, need to check how complex they are.

  24.  

  25. Spinn: Lots of FK’s (so extremely heavy use of DRI = good), No views or Stored procs. Probably created by a programmer, where all data integrity would be handled by their application. Their app will generate TSQL. This would be a nightmare to tune unless they have Unit & Load tests created for their code. At best you will be creating indexes or playing with H/W. I’d expect to find a lot of dirty data as very few applications do a good job at checking anything other than the datatypes & a drop down list of values. The relations between the fields in a row are often not checked at all eg: This results in Pregnant males, Customers born in the future, Orders delivered before they were placed & 32 year old Primary School students. At least the relationships between rows should be good.

  26.  

  27. SSRS (SQL Reporting Services): Good use of Stored Procs & FK’s. Very few tables. Not a lot to look at, easy to understand.

  28.  

  29. AdventureWorks 2008: Modest number of tables. DRI is brilliant, data likely to be of good quality, especially when you look at the number of Rules & Check constraints. Few stored procs, so app likely to be passing TSQL. Rules also suggest an older design, so look for depreciated constructs.

  30.  

  31. Master: The numbers suggest there are many things wrong with this database. The only good news is that a tiny number of tables suggest it is not used by more than 1 app. Of course in this case we know that since SQL2005 the Master DB was split in two. The other part is the resource database, which is not shown here. So it makes little sense drawing any conclusion about it other than “Hey it is a small number of tables, should be easy to figure out what they do.”

  32.  

  33. Sample of TSQL for this type of Pre-Scan

  34. Version 1.  - Faster, Sort in Order you want.

  35. Only shows objects that exist. It will also display new Objects Types if they are introduced in future versions of SQL

  36.  

  37. USE O4Jul09

  38. go

  39. ;WITH ObjectTypes AS (

  40. SELECT [type], type_desc, COUNT(*) AS [Count]

  41. FROM sys.objects

  42. GROUP BY [type], type_desc

  43. )

  44. SELECT *

  45. FROM ObjectTypes

  46. ORDER BY CASE [type]

  47. WHEN 'SN' THEN 10 -- Synonym

  48. WHEN 'V' THEN 11 -- View

  49. WHEN 'U' THEN 20 -- User Table

  50. WHEN 'PK' THEN 21 -- PRIMARY KEY constraint

  51. WHEN 'F' THEN 22 -- FOREIGN KEY constraint

  52. WHEN 'UQ' THEN 23 -- UNIQUE constraint

  53. WHEN 'C' THEN 24 -- CHECK constraint

  54. WHEN 'D' THEN 25 -- DEFAULT (constraint or stand-alone)

  55. WHEN 'R' THEN 26 -- Rule (old-style, stand-alone)

  56. WHEN 'TR' THEN 30 -- SQL Trigger

  57. WHEN 'SQ' THEN 31 -- Service queue

  58. WHEN 'P' THEN 40 -- SQL Stored Procedure

  59. WHEN 'TF' THEN 41 -- SQL Table-valued-function

  60. WHEN 'IF' THEN 42 -- SQL Inline Table-valued function

  61. WHEN 'FN' THEN 43 -- SQL Scalar function

  62. WHEN 'X' THEN 50 -- Extended stored procedure

  63. WHEN 'RF' THEN 51 -- Replication-filter-procedure

  64. WHEN 'PC' THEN 60 -- Assembly (CLR) stored-procedure

  65. WHEN 'FT' THEN 61 -- Assembly (CLR) table-valued function

  66. WHEN 'FS' THEN 62 -- Assembly (CLR) scalar-function

  67. WHEN 'AF' THEN 63 -- Aggregate function (CLR)

  68. WHEN 'TA' THEN 64 -- Assembly (CLR) DML trigger

  69. WHEN 'PG' THEN 80 -- Plan guide

  70. WHEN 'IT' THEN 90 -- Internal table

  71. WHEN 'S' THEN 91 -- System table

  72. WHEN 'TT' THEN 92 -- TYPE_TABLE

  73. ELSE 100

  74. END

  75. go

  76.  

  77. Version 2.  - Slightly Slower.  Highlights which object types the schema doesn't use. 

  78. Only shows objects that exist

  79.  

  80. USE O4Jul09

  81. go

  82. ;WITH ObjectTypes AS (

  83. SELECT [type], type_desc, COUNT(*) AS [Count]

  84. FROM sys.objects

  85. GROUP BY [type], type_desc

  86. )

  87. , ObjectGroups AS (

  88. SELECT 'SN' AS [type] , 10 AS SortOrder, 'Synonym' AS type_desc, 'Any Redirection to be aware of?' AS Action UNION ALL

  89. SELECT 'V' , 11 , 'View', 'Does it make extensive use of Views?' UNION ALL

  90. SELECT NULL, 19, '.', '.' UNION ALL

  91. --< User Tables >--

  92. SELECT 'U' , 20, 'User TABLE', 'Amount of tables' UNION ALL

  93. SELECT 'PK', 21, 'PRIMARY KEY constraint', '% of Tables with PK''s defined' UNION ALL

  94. SELECT 'F' , 22, 'FOREIGN KEY constraint', 'Are they using DRI?' UNION ALL

  95. SELECT 'UQ', 23, 'UNIQUE constraint', 'Check M:M join tables are correctly protected' UNION ALL

  96. SELECT 'C' , 24, 'CHECK constraint', 'Are Data values protected by schema?' UNION ALL

  97. SELECT 'D' , 25, 'DEFAULT (constraint or stand-alone)', 'Do they rely on Defaults or SPs or nothing' UNION ALL

  98. SELECT 'R' , 25, 'RULE (old-style, stand-alone)', 'Are they using Depreciated Checks' UNION ALL

  99. SELECT NULL, 29, '.', '.' UNION ALL

  100. --

  101. SELECT 'TR', 30, 'TRIGGER', 'Look at ratio vs User Tables. Heavy / Light use of Triggers?' UNION ALL

  102. --< SQL Broker >--

  103. SELECT 'SQ', 31, 'Service QUEUE', 'Are they using Async "Triggers" or SQL Broker?' UNION ALL

  104. SELECT NULL, 39, '.', '.' UNION ALL

  105. --< Procedural Code >--

  106. SELECT 'P' , 40, 'Stored Procedure', 'Number & Ratio to User Tables' UNION ALL

  107. SELECT 'TF', 41, 'Table-Valued-function', 'Are they being used?' UNION ALL

  108. SELECT 'IF', 42, 'Inline Table-Valued function', 'Are they being used?' UNION ALL

  109. SELECT 'FN', 43, 'Scalar function', 'Are they being used?' UNION ALL

  110. SELECT NULL, 49, '.', '.' UNION ALL

  111. --

  112. SELECT 'X', 50, 'Extended stored procedure', 'Need to think about security' UNION ALL

  113. SELECT 'RF', 51, 'Replication-filter-procedure', 'Will need to think about external connections' UNION ALL

  114. SELECT NULL, 59, '.', '.' UNION ALL

  115. --< CLR Use >--

  116. SELECT 'PC' , 60, 'Assembly (CLR) stored-procedure', 'Are they useing CLR?' UNION ALL

  117. SELECT 'FT', 61, 'Assembly (CLR) table-valued function', 'Are they useing CLR?' UNION ALL

  118. SELECT 'FS', 62, 'Assembly (CLR) scalar-function', 'Are they useing CLR?' UNION ALL

  119. SELECT 'AF', 63, 'Aggregate function (CLR)', 'Are they useing CLR?' UNION ALL

  120. SELECT 'TA', 64, 'Assembly (CLR) DML trigger', 'Are they useing CLR?' UNION ALL

  121. SELECT NULL, 69, '.', '.' UNION ALL

  122. SELECT NULL, 79, '.', '.' UNION ALL

  123.     --< Other >--

  124. SELECT 'PG', 80, 'Plan guide', 'Some serious compatibility work has been done on this DB' UNION ALL

  125. SELECT NULL, 89, '.', '.' UNION ALL

  126. --

  127. SELECT 'IT', 90, 'INTERNAL_TABLE', '-' UNION ALL

  128. SELECT 'S' , 91, 'SYSTEM_TABLE', '-' UNION ALL

  129. SELECT 'TT', 92, 'TYPE_TABLE', '-'

  130. )

  131. SELECT COALESCE(G.[type], T.[type], '.') AS Type, COALESCE(G.type_desc, T.type_desc) AS [Type Description], ISNULL(CAST(T.[Count] AS VARCHAR(8)), '.') AS Quantity, G.Action

  132. FROM ObjectGroups AS G

  133. FULL JOIN ObjectTypes AS T ON T.[type] = G.[type]

  134. ORDER BY SortOrder

  135.  

  136.  

  137.  

 

Examples of insight you can gain from this information.

User Tables

 The more they have the more complex the schema. Obvious huh?

 

Primary Keys

 Generally every table should have one. If they don't there should be a really good reason. Find out what it is. More often I hear "Oh! Really", "Yer well I was in a hurry that day & was going to get back to it." or "That was when I was new" or “What’s a Primary key?”.
Sometimes are good reasons eg: these are “Staging Tables” where data is bulk loaded into them, manipulated with Table scans & moved on. This supports an Extract, Load & Transform (ELT) style of data movement. (NB: Most people still refer to this as Extract, Transform & Load (ETL) processing). You may think that this activity may be better handled in TempDB, but sometimes a complex series of batch load processes need; i)a way to pass data from one to the other or ii) die & be restarted without doing everything from scratch.

 

Foreign Keys

Are they using Declarative Referential Integrity (DRI)? If not, Why not?

Perhaps this is a Data warehouse & the relationships were validated in the staging database or the ETL process.

Perhaps they are relying on Stored Procs & Triggers - fragile & possibly slow.

You should ensure that FK Constraints have a useful index to enforce them. Without it, queries can deadlock on themselves &/or cause table scans even on single row modifications.  Note: That does not mean you should automatically create a Non-Clustered index on just that specific column. Perhaps adding another column as a composite key or as an Included field will  permit the index to cover the needs of a commonly run query as well.

 

Unique Constraints

  • These may be unnecessary if the Primary Key fields were selected correctly but if there are none, it raises an alarm bell.
    • Look for tables designed to support Many to Many Joins. There is a high probability they've not been indexed / constrained correctly. Resulting in duplicate rows & highly inefficient queries.
    • Most databases I've seen suffer from duplicate rows. Especially ones with Web front ends that don't prevent the users from hitting OK multiple times when the interface fails to respond quickly. The only difference in the rows is the Auto-generated Primary key. A Unique constraint can prevent that but effectively you are creating the entire table again in the index. So use with care. It is probably better to make the two foreign Key columns a composite Primary key & remove the Identity column you are currently using as a primary key.

 

Check Constraints

  • The use of Check Constraints tends to suggest the database was designed by a DBA rather than a programmer. Programmers tend to validate the input in Stored Procedures &/or their application. (NB: I did not say one approach was always better than the other. )
  • Check Constraints are vital to get all the performance benefits from a Partitioned table.

Default Constraints

  • Very few or zero Defaults, suggests they are setting these values in Stored Proc's or in the App code.
  • Be alert, DB's with high numbers of Default, often use the poor practise of Insert a row & then immediately Select it to determine values of the Defaults & Identity fields.

This often results in an index search to Update/Insert a row & another to select it. Also resulting in 2 network round trips, so more network traffic.

If you need to return the values generated by your Insert ie: set by Identity, Default,  & triggers.  Use an OUTPUT Clause on your insert, instead of re-Selecting the rows you've just inserted. 

-- < Inefficient 2 Separate queries, 2 Network round trips >--

INSERT INTO InsertOutput(name) VALUES('Dave11')   

go

SELECT * FROM InsertOutput WHERE name = 'Dave11'

go

 

-- < More efficient 1 query, 1 Network trip, ~33+% faster >--

INSERT INTO InsertOutput(name)

OUTPUT inserted.ID, inserted.Name, inserted.DateIn, inserted.UserName

VALUES ('Dave21')

go

 

Rules

  • If you find any, make time to rewrite them as CHECK Constraints.
  • I really liked the old style Rules, Defaults & User Defined Types as they could be created once & use in many tables. Thus ensuring consistency throughout the design. Unfortunately they are not ANSI compliant & are depreciated. Also you needed to be sure of your decisions when you created them as changing them caused a maintenance nightmare.

 

Triggers

  • Debugging Triggers &/or assessing their impact on performance can be very tricky. Knowing that Triggers exist in the design helps you factor that into your plans. Be aware that neither sp_depends nor the dependency DMV's do a perfect job at looking at objects called by triggers.
  • Look for INSTEAD OF triggers they may completely change the impact of what you thought DML statements would do to that table.
  • Where Triggers are used to maintain other entities see if you can rewrite it using Queues. Eg: If placing a Customer order causes a trigger to fire that initiates a workflow to restock your warehouse if the stock level is low. This activity is often better handled asynchronously using SQL broker. One benefit is that your end user will get a more predictable response time when their Order placement transaction is not suddenly burdened with the overhead of restocking the warehouse.

Service Queues

  • If they are not being used, it is likely that they could be beneficial .

 

Stored Procedures

  • Look at the number compared to User Tables.
  • If there are 3 SP's for each table it is likely a poor design. An Insert, an Update & a Delete stored procedure, is unlikely to encapsulate logic very well.
  • If there are very few SP's it is likely to become a maintenance nightmare. Without the SP's to abstract the Database schema from the application, all the queries will be created & generated in code. Making changes to the schema will be almost impossible to test. (Unless they have 100% code coverage in their Unit Tests & are not using Mocking to prevent the tests actually calling the DB. This is very rare)

 

SQL Functions

When used carefully these can improve code reuse & improve performance. Most often these create big perf issues. Watch for :-

    • Calling a trivial function from within a WHERE clause, if may prevent you using an index & result in a table scan.
    • Calling a Function from a Select list. It gets called once for each row in the resultset, which could be a huge perf hit. Often it is better to rewrite the Function as part of the Join clause.

Extended Stored Procedures

  • Check security is done correctly.
  • Check if they are necessary. Perhaps it is only there to support 1 batch process which might be better done via SQLCMD, Powershell or SSIS Workflow.

 

Replication Filter Procedures

This gives you a hint that you need to think about the replication system & interop with other SQL databases.

 

CLR Objects

  • Lets you know if they are being used.
  • They can be extremely powerful especially to enhance string manipulation.
  • These are not used by many people. Typically they are only suggested by DBA's with an extensive knowledge of the entire feature set of SQL Server & fairly good programming skills. So generally the DB design will be sound, there will be no easy wins & you will need to work hard to prove your value.

 

Plan Guides

  • Again these are an advanced feature. Anyone who created them would need to be extremely familiar with query plan optimisation. Thus you will need to think hard about any changes you make.

As always I’m interested in any thoughts & suggestions you may have

Dave

 

Random Thought for the post:

“Look before you leap” might be good advice. But it is unlikely to be a great legal defence if police catch you hanging off the roof, staring into the change room window at the local public pool.