SharePoint and Database Statistics: Why are they out-of-date and what to do about it.


I was recently approached by an old friend and MCS colleague (Greg Moss) who was working on a special project for a mutual customer.  This project was to conduct a health review of the database environment used to support a very large SharePoint farm and came as a result of some unexplainable SQL performance problems that had hindered their environment for a few days.  It turns out that the problem was un-related to what I will discuss in this post but what we found as part of the investigation was very interesting and something worth sharing. 

Here is the story:

Greg:  Hey Eric, It’s Greg.  Have time for a quick question?

Eric: Sure Greg, what’s up?

Greg: Well, as you know I am performing this health check and as part of my investigation I am seeing some weird things from a SharePoint/SQL Server perspective that I hope you can explain.

Eric:  Okay, sure.  What’s going on?

Greg:  Can you explain to me what database maintenance operations SharePoint performs?  And how?

Eric: Well, sure.  SharePoint has some built-in health checks that are implemented as timer jobs and run at various intervals (hourly, daily and weekly).  What specifically are you looking at?

Greg: Database Statistics

Eric: Okay.  That health check is part of a database maintenance job that runs daily and by default is scheduled to run at midnight.  This timer job performs an update statistics operation on all tables within SharePoint databases.

Greg:  Okay.  Then can you explain why some of the content databases have statistics that were updated last night and other databases haven’t had statistics updated for over a month?

Eric: Um.  No.  That is weird.  How do you know that some statistics were updated and others were not?

Greg:  You can run a query to display when statistics were updated.   Running the following query will show the statistics information for each database.

EXECUTE sp_msforeachdb 
'USE [?];
IF DB_NAME() NOT IN(''master'',''msdb'',''tempdb'',''model'')
   begin 
        SELECT  ''CHECKING STATS FOR '' + DB_NAME() AS ''DATABASE NAME''
        SELECT   OBJECT_NAME(A.OBJECT_ID) AS ''TABLE NAME''
               , A.NAME AS ''INDEX NAME''
               , STATS_DATE(A.OBJECT_ID,A.INDEX_ID) AS ''STATS LAST UPDATED''
          FROM   SYS.INDEXES A
          JOIN   SYS.OBJECTS B
            ON   B.OBJECT_ID = A.OBJECT_ID 
         WHERE   B.IS_MS_SHIPPED = 0
         ORDER   BY OBJECT_NAME(A.OBJECT_ID),A.INDEX_ID
     end'

Eric:  That’s a nice trick. I’ll have to remember that one. What did you see?

Greg:  When I reviewed the data from the production server, I saw that some statistics are getting refreshed and others are not (even within the same database) and I see some databases haven’t had statistics updated in months. The customer says that SharePoint shows the database maintenance jobs have run every day for months.

Eric:  We can verify that.  Let’s look through the SharePoint logs and see if we can see find any entries with an event id of ‘cm1x’ and ‘cm1y’.  We should see an event containing ‘cm1x’ at the start of the update statistics operation and then an additional ‘cm1y’ event for each content database for which which we execute the update statistics command.

Greg:  Okay. Let me check… Yep, I see those entries. They look like this:

Date Time

Process

Category

Event Id

Level

Message

10/17/2012 00:00:06.22

OWSTIMER.EXE (0x0560)

Health

cm1x

Medium

Updating statistics in all databases on Server.

10/17/2012 00:00:06.22

OWSTIMER.EXE (0x0560)

Health

cm1y

Medium

Updating statistics in the database, StateService_DB.

10/17/2012 00:00:06.22

OWSTIMER.EXE (0x0560)

Health

cm1y

Medium

Updating statistics in the database, Secure_Store_Service_DB.

10/17/2012 00:00:06.22

OWSTIMER.EXE (0x0560)

Health

cm1y

Medium

Updating statistics in the database, WordAutomationServices_DB.

10/17/2012 00:00:06.23

OWSTIMER.EXE (0x0560)

Health

cm1y

Medium

Updating statistics in the database, Bdc_Service_DB.

10/17/2012 00:00:06.23

OWSTIMER.EXE (0x0560)

Health

cm1y

Medium

Updating statistics in the database, WebAnalyticsServiceApplication_StagingDB.

10/17/2012 00:00:06.34

OWSTIMER.EXE (0x0560)

Health

cm1y

Medium

Updating statistics in the database, SharePoint_Config.

10/17/2012 00:00:11.03

OWSTIMER.EXE (0x0560)

Health

cm1y

Medium

Updating statistics in the database, PerformancePoint Service Application.

10/17/2012 00:00:11.03

OWSTIMER.EXE (0x0560)

Health

cm1y

Medium

Updating statistics in the database, Search_Service_Application_CrawlStoreDB

10/17/2012 00:00:11.07

OWSTIMER.EXE (0x0560)

Health

cm1y

Medium

Updating statistics in the database, Search_Service_Application_PropertyStoreDB.

10/17/2012 00:00:11.08

OWSTIMER.EXE (0x0560)

Health

cm1y

Medium

Updating statistics in the database, Application_Registry_Service_DB

10/17/2012 00:00:11.08

OWSTIMER.EXE (0x0560)

Health

cm1y

Medium

Updating statistics in the database, SharePoint_AdminContent.

10/17/2012 00:00:11.85

OWSTIMER.EXE (0x0560)

Health

cm1y

Medium

Updating statistics in the database, WSS_Content.

10/17/2012 00:00:11.91

OWSTIMER.EXE (0x0560)

Health

cm1y

Medium

Updating statistics in the database, WSS_Content_80.

10/17/2012 00:00:12.39

OWSTIMER.EXE (0x0560)

Health

cm1y

Medium

Updating statistics in the database, Search_Service_Application_DB.

10/17/2012 00:00:12.39

OWSTIMER.EXE (0x0560)

Health

cm1y

Medium

Updating statistics in the database, Managed Metadata Service_DB.

10/17/2012 00:00:12.39

OWSTIMER.EXE (0x0560)

Health

cm1y

Medium

Updating statistics in the database, WSS_Logging.

10/17/2012 00:00:25.55

OWSTIMER.EXE (0x0560)

Health

cm1y

Medium

Updating statistics in the database, WebAnalyticsServiceApplication_ReportingDB.

10/18/2012 00:00:29.49

OWSTIMER.EXE (0x1618)

Health

cm1x

Medium

Updating statistics in all databases on Server.

Eric:  Wow, that is interesting.  It looks like SharePoint is telling us that it is updating statistics even though it appears that it is really not.  I wonder what’s going on.

Greg: From a SQL perspective we can verify the data and see exactly when the statistics were last updated.  The SharePoint log indicates it ran an update every day even though we have verified that the date the statistics were last updated was over a month ago.

Eric: I am not sure what is going on here but this is weird. I have seen instances with other customers where statistics were not being updated nightly like they should have been but I never had time to look into it. Let me do some research and get back to you.

Greg:  Okay.  Thanks.  I’ll keep digging from a SQL perspective.

Eric:  Me too.  I’ll get back to you with what I find.

Our Findings

That conversation was the beginning of an investigation that led to some very interesting findings.  It turns out that the database maintenance timer job will fire off every night as expected. However, we learned that if a database is being backed up at the time the database maintenance job is running, the update statistics operation will be skipped.  Furthermore, we learned that it not just a full database backup that can cause the operation to be skipped but also a differential and transaction log backup as well.  What’s really troubling here, however, is that SharePoint does not give any indication to the user that the statistics update operation was skipped.  In fact, as we saw in the SharePoint logs above, it seems to lead one to believe that statistics were updated though they were in fact NOT updated.

Thinking about what we know about most SharePoint environments we realized that this is a BIG problem for a couple of reasons:

  • Many environments have nightly backup jobs scheduled to start at midnight which is the same time the SharePoint database maintenance jobs are scheduled to run by default.
  • More concerning, however, is the realization that SharePoint environments have databases configured for FULL recovery model and are running in highly available configurations such as log shipping or mirroring.  This increases the risk of encountering this issue.  These environments could be running transaction log backups as often as every 15 minutes or less and are at risk of the database maintenance operation never completing successfully.

This is an issue that is likely affecting thousands of SharePoint users all over the world.  I am personally familiar with dozens of cases (if not more) where users were experiencing performance problems due to SQL contention that were resolved by performing a manual update of statistics.  In many of these cases, the problem re-occurred in a few days or weeks only to be resolved by another update of statistics.

 

Our Recommendations

Based upon these findings we recommendation that every major SharePoint installation schedule a nightly SQL job that calls the proc_UpdateStatistics stored procedure in each database to ensure that index statistics are getting updated at least daily.

To accomplish this, create a SQL Agent job that runs at least daily and runs the following script:

 
EXECUTE sp_msforeachdb 
'USE [?];
IF DB_NAME() NOT IN(''master'',''msdb'',''tempdb'',''model'')
     begin 
          print ''updating statistics in database  ==> '' + db_name()
          if exists (select 1 from sys.objects where name = ''proc_updatestatistics'')
             begin
                  print ''updating statistics via proc_updatestatistics''
                  exec proc_updatestatistics 
             end
         else 
             begin 
                  print ''updating statistics via sp_updatestats''
                  exec sp_updatestats
             end
    end'

Credit

A huge thank you goes out to my friend Greg Moss for identifying this issue and digging into the SQL server side of things.  This is an issue that I have encountered in many environments and never dug deeper until Greg started asking questions.  This issue is an important issue and under investigation by the product development team.  There may likely be additional updates and findings to follow so stayed tuned.

# # # # #

Comments (11)

  1. Sree says:

    Nice post…

    So, what's your recommendation on "Auto Update Statistics" setting at the Database level?

  2. Sree,

    While turning on "Auto Update Statistics" sounds like a good idea, I was unable to get an official statement about whether it is supported.  I was able to get confirmation that calling proc_updatestatistcs" is supported and is the recommended approach.

    Furthermore, do NOT enable "Auto Create Statistics"  that is EXPLICITLY NOT SUPPORTED.

  3. Wow, thanks for this VERY interesting post ! I wonder if this will solve my "massive slow uploads" issue caused by this query "SELECT StatMan([SC0], [LC0]) FROM (SELECT TOP 100 PERCENT CONVERT([varbinary](200), SUBSTRING ([Content], 1, 100)++substring([Content], case when LEN([Content])<=200 then 101 else LEN([Content])-99 end, 100)) AS [SC0], datalength([Content]) AS [LC0] FROM [dbo].[AllDocStreams] WITH (READUNCOMMITTED)  ORDER BY [SC0] ) AS _MS_UPDSTATS_TBL"  that is 'stuck' every morning for about 20 minutes when the first user starts uploading something in a #SP2010 document library….

  4. Jeroen,

    Though I am not a SQL expert, I did some research on that query that you provided and it appears to be a query that you see on SQL Server 2005 when updating statistics.  It is not possible to determine from the information provided whether this is from a scheduled task or if it is something the SQL engine is doing automatically.  Refreshing statistics at a controlled regular frequency could help here.  I'd also check the server MAXDOP setting and make sure it is 1.

    HTH,

    Eric

  5. Delsk says:

    Eric, thanks for researching this for me and helping me identify that this has been fixed in the April 2013 CU for SharePoint 2010.

  6. Riccardo Celesti says:

    Great stuff, thank you! Does this article still apply to SharePoint 2013?

  7. EricA [MSFT] says:

    Riccardo,

    SP2013 is not affected by this issue.  Also, this was addressed in the April 2013 CU for Sp2010 so current installation of SP2010 should no longer be affected by this.

    Run the SQL to display current statistics status to verify.

    Thanks,

    Eric

  8. RobB says:

    I didn't find any reference to this issue in the April 2013 CU KB. What was fixed? Can the Stat job run during a backup/crawl or does it now report an error?

  9. EricA [MSFT] says:

    RobB – There were changes to the timer job code as well as changes to the stored procedure that gets called.   This timer job is not impacted by a crawl at all and it will successful run during a backup.  If you suspect this job is not doing what it is supposed to monitor the ULS logs for the entries above then run the include SQL to report when statistics/indices were last updated..  

  10. MariaV says:

    Great post!  Thanks for the information.  Do you happen to know if this also applies for the index reorg part of the Health Analyzer timer jobs (proc_DefragmentIndices is called)?

  11. Anonymous says:

    Awesome.  This caused us problems and money for years.