Best Performer: Distributed query (Four-part) or OPENQUERY when executing linked server queries in SQL Server

I wanted to share this information to all those who are searching for better performer when it comes to distributed query vs openquery for running linked server queries.

Distributed Query : Linked server four part queries are also called distributed queries. Using distributed queries,  you can refer tables on different data sources/servers in a single query. Query optimizer creates an execution plan by looking at the query nomenclature and breaks it into remote and local queries. Local queries are executed locally and data for remote queries are collected from the remote servers, scrubbed locally, combined together and presented to end user as single record set. 

OpenQuery : Executes the specified pass-through query on the specified linked server. SQL Server sends pass-through queries as un-interpreted query strings to an OLE DB data source . That is, SQL won’t apply any kind of logic on the query and won’t try to estimate what that query would do, it would simply pass the specified query as it is to the target linked server. Open queries are useful when you are not referencing multiple servers in one query. It’s generally fast as SQL does not  break it into multiple operations and does not perform any local action on the output received.

 So which is faster Distributed Query or Open query and why?

The answer is, generally, OPENQUERY would be faster but distributed queries could be as fast too.

For example lets’ say I have linked server between two SQL instances SQL1 and SQL2. And I have to do select count(*) on emp table in test database on remote server SQL2.

Distributed query would be something like SELECT count(*) FROM [SQL2].[test].[dbo].[emp]

OPENQUERY would be  SELECT * from OPENQUERY([SQL2], 'SELECT count(*) FROM [test].[dbo].[emp]')

If you look at the execution plan by running SET STATISTICS PROFILE ON, you can see that for executing distributed query, SQL1 sends a request to SQL2 to send the statistics information for table emp in the database test. Please note that the user account running this distributed query must have certain permissions in the remote server as documented in https://msdn.microsoft.com/en-us/library/ms175537.aspx to be able to gather data distribution statistics from the remote server else SQL Server might generate less efficient query plan and you will experience poor performance. UPDATE: This is going to be fixed in SQL Server 2012 so that if you have SELECT permission over columns of the index/statistics AND have SELECT permission over all columns of the FILTER (WHERE/HAVING) clause in the target server, you will be able to get the histogram. Important: The target or remote server has to have the SQL Server 2012 version with fix applied.

We have seen issues where too many connections running distributed queries end up in a SOSHOST_MUTEX wait while SQL Server is collecting data distribution statistics from the remote server. Also it has to be noted that a single query makes connection atleast two times to the remote server in case of distributed query, first connection to gather statistics and second connection to collect the actual data in the table.

Another disadvantage in case of distributed query is that though you have a WHERE clause in your query, you might notice that when the query is sent to retrieve the rows of a table in the remote server, SQL Server will just send a SELECT * FROM the remote table and then locally it filters out the necessary data after applying the predicates.

But in OPENQUERY, SQL Server sends the complete query to remote server SQL2 and resources of the SQL2 is spent in processing the query like parsing the SQL statements, generating a plan, filtering the rows as per predicates. Then the final resultset is sent to SQL1 which then just displayes what it received from SQL2.

Now, you tell me which one is better?

Further Reads:

Distributed Queries - https://msdn.microsoft.com/en-us/library/ms188721.aspx

OPENQUERY - https://msdn.microsoft.com/en-us/library/ms188427.aspx

Guidelines for Using Distributed Queries - https://msdn.microsoft.com/en-us/library/ms175129.aspx

How to pass a variable to a linked server query - https://support.microsoft.com/kb/314520

https://connect.microsoft.com/SQL/feedback/ViewFeedback.aspx?FeedbackID=475804 (Community request for permissions required on the remote server)