As part of the blog series TSQL – Solve it YOUR Way, today’s topic will cover a question asked in the Transact-SQL MSDN Forums here, followed by three different solutions from three of the more helpful and creative contributors in the TSQL MSDN forums, Jens Suessmeyer, Naomi Nosonovsky, and Jingyang Li.
Topic: How can I find the percentage of NULL values for each column in my table?
For various reasons, people may need to determine the percentage of values in each column that are NULL. Perhaps you are trying to calculate the percentage to determine if your table could save disk space by leveraging SPARSE columns. In another real-world example from the forums, a user had an implementation where partial rows in a table were being updated at different times. When all rows were updated with a value and were thus NOT NULL, the table was considered “complete” and could be processed. For this reason, the customer was trying to determine when the percentage of NULL values hit 0%. In yet another case, the user wanted to present a comparison chart throughout the entire database showing the ratio of NULL values to NOT NULL values. For the final example, take a look at the explanation provided below by Jens where he solved this problem at a customer site.
Solution #1 – Provided by Jens Suessmeyer
Explanation of Jens’ solution:
To put a bit more context to my solution, I was at a customer site where they found out that their API was occasionally losing the information on the way from the front end to the database, therefore NULLing out specific fields which should not be NULL. While they checked through their database and wrote queries for their hundreds of tables, they asked me working for another work stream to write something they can use. The simple idea is to grab a Table Name (and of course the appropriate schema) and create a Dynamic SQL statement which will calculate the percentage of NULL values in a specific table.
From the query the following information will be returned:
In detail we see that the Color (on purpose or not) is not always entered showing a 49% of NULL values in that column. 58% for Size, 65% for SizeUnitMeasureCode, etc. The rest does have (at least NON NULL) values. If you would be also interested in the percentage of that are NULL OR Empty the query can be easily tweaked by changing the source code the following line from:
@Statement = @Statement + ‘SUM(CASE WHEN ‘ + COLUMN_NAME + ‘ IS NULL THEN 1 ELSE 0 END) AS ‘ + COLUMN_NAME + ‘,’ + CHAR(13) ,
@Statement = @Statement + ‘SUM(CASE WHEN LEN(ISNULL(‘ + COLUMN_NAME + ‘,’ + REPLICATE(CHAR(39),2) + ‘)) = 0 THEN 1 ELSE 0 END) AS ‘ + COLUMN_NAME + ‘,’ + CHAR(13) ,
Leading to a change in the generated code of:
SUM(CASE WHEN LEN(ISNULL(CompanyName, ”)) = 0 THEN 1 ELSE 0 END) AS CompanyName,
Feel free to include your own logic and forensic pattern to find all the black holes in your databases.
Solution #2 – Provided by Naomi Nosonovsky
Explanation of Naomi’s solution:
- To get the percent of null values in a column we use this formula
sum(case when ColumnName IS NULL then 1 else 0 end) * 100.00 / @TotalCount
where @TotalCount is the number of rows in a table.
- We need to generate the above statement dynamically, so we use INFORMATION_SCHEMA.Columns view to do so.
- I use FOR XML PATH(”) approach to concatenate information into one variable
- I use sp_executeSQL system stored procedure to run the dynamic SQL in order to supply a variable. Alternatively, of course, I could have declared and calculated @TotalCount inside the dynamic SQL.
Solution #3 – Provided by Jingyang Li
Explanation of Jingyang’s solution:
Jingyang initially sent a solution very similar to that of Naomi, so in order to provide an additional technique, he also provided the CURSOR based solution above. The cursor is built by leveraging the INFORMATION_SCHEMA table as used in all three solutions, but Jingyang uses a nice optimization technique of only querying for Is_Nullable = ‘YES’ to only obtain NULLable fields. In the CURSOR loop, the solution relies on some creative use of the OVER() function to generate the percentage and results, which are displayed in a way that is very readable and concise.
As you can see, all three of the above solutions provide the intended result we were looking for, but do so in creatively different styles, this time including our first CURSOR based solution. I hope that you are able to learn a lot by trying out the problem yourself and then reading through the additional solutions.
Special thanks to Jens, Naomi, and Jingyang for their valuable forums contribution and for contributing to this series!
Hope that helps,
Sam Lester (MSFT)
Jens Suessmeyer is an MCS Consultant in Germany and has been working with SQL Server since version 6.0. He is very active in the forums and in his blog (http://blogs.msdn.com/b/jenss), providing outstanding real-world solutions and insight.
Naomi Nosonovsky, Senior Software Developer, has more than 15 years of enterprise experience in analysis, design, development and implementation of high-performance client/server data management solutions. She is a Personality of the Year for 2008, 2009, 2010 and 2011 at www.universalthread.com in .NET, SQL Server & VFP categories. She is also an All Star contributor/MCC Community Contributor at forums.asp.net and MSDN T-SQL forum. She also actively blogs at http://blogs.lessthandot.com/index.php/All/?disp=authdir&author=218 and http://beyondrelational.com/members/naomi/modules/2/posts.aspx?Tab=16.
Jingyang Li has been working with SQL Server since he began his IT career as an ASP.NET developer in 2001. He enjoys working with T-SQL and recently took a full time job as a DBA. He has been participating in the Microsoft forums under the alias Limno.