Directly querying SharePoint databases

One of the first rules of SharePoint that you learn when you start diving in to the product is that you shouldn't ever touch the SharePoint databases - it's fairly well known that you should leave them alone, but I've been asked the question a few times over the years "Well what if I'm just reading from the databases? Surely that's ok because I'm not making any changes to the data", so what I wanted to cover off in this post is why it's not a great idea to read from the databases and what the impact of doing this is.

Is reading the from the databases supported?

To answer this question the best place to start is a knowledge base article we have published at https://support.microsoft.com/kb/841057. This article discusses what is supported in regards to executing queries against the SharePoint databases, which I won't bother reproducing here but this is where the "don't touch the database" advice is formally documented. There is a section at the bottom of the page though that specifically calls out making reads from the database though, stating that this should be avoided as it can cause unexpected locking within the databases which can adversely affect performance and cause issues within the product.

What is locking in SQL Server?

So what exactly is locking and why does it matter? The short version of the answer here is that locking is how SQL server keeps resources isolated during a transaction to ensure that two transactions won't impact on each other. SQL server can apply locks automatically at different levels based on the impact of the query - for example, a query might impact on only a few rows which might result in SQL locking only just those rows, it might decide locking off a page or even up to locking a whole table. While a resource is locked other transactions won't be able to get to them, which is why SQL automatically manages where these locks apply to minimise the impact based on the query you have executed. When you execute queries against SQL server you can use the NOLOCK or READUNCOMMITTED table hints to specify that you want to make what is called a "dirty read" from the database, which will ignore current locks but you run the risk of retrieving data that isn't current which may result in errors in your logic.

Why database locks impact on performance

Putting it bluntly, if SharePoint is making requests to SQL and the queries have to wait for other locks to be released (or worse they time out all together) then it's going to appear slower to your end users while it loads the content they have requested - that simple. At any given time SharePoint will be making any number of calls to the databases for various things and this is not something you want to interfere with in ways that have not been tested and assessed by the product team - this is why SharePoint has a number of API options to query it's data, so you can get the information you need in more consistent and tested ways so that overall performance and stability of the environment won't risk being impacted by outside queries.

But using the API makes my application slower than querying SQL directly!

Yes we have an API and it lets you get to lots of data, but the reality of this matter is that the end result of using this is that it might take longer to get your data than querying the database directly. This however is the trade off you take for using SharePoint for your application. As a development platform SharePoint brings a lot to the table that you don't need to develop for yourself, but that comes at the cost of relying on using the API and it's inbuilt querying methods to get the data from SQL so that the platform can remain stable. Now this isn't to say that you can't have a negative impact on performance through using the API poorly, it's entirely possible to cause issues that way as well - but thanks to some performance management and throttling features built in to SharePoint this can be mitigated and managed, at least more so than a direct SQL query would be where the onus to get things write is entirely on the person writing the queries.

 


So that's a quick run through of what you should know about writing queries against SharePoint content databases - you might think that your read only queries are completely harmless, but be aware of the potential impact of what you are doing on the performance of your system, and how that might impact on support during a case. I hope that clears things up a little for you!