APS / PDW Best Practices: Linked Server: OpenQuery VS EXEC

Many users utilize Linked Server functionality in order to issue queries to PDW using SSMS.  This is completely supported and a practical way of processing.  However it is important to pay attention to how you are issuing the queries through PDW.  There are two popular methods, either using OPENQUERY or EXEC.   The latter is the preferred method and will result in much more efficient processing, lower resource utilization on the APS installation, and simplification in the DMV's mentoring query execution. Open query is supported, however is mainly present for backwards compatibility.  The behavior described below also occurs if issuing queries to a SQL instance.  For this reason SQL also discourges users form using this method and ar in favor of the EXEC execution method.

 

See below for a detailed write up of what happens when a query is issued through OPENQUERY and EXEC and what impact it can have on an APS appliance.

 

Issue

Large number of queries/Sessions issued to the appliance during a short time period.  Some of these include SP_PREPARE which the customer is not running.

 

Observations

  • During the peak times, the CPU on the CTL node had very high utilization.
  • There were a large number of queries running, many beyond the 32 concurrency limit
  • There were a large number of prepared statement/parametrized query executions
  • Also noticed explicit transactions and rollback

 

Findings

  • The queries are being executed using SQL server linked server openquery syntax, such as:

select * from openquery(cssc8a ,' select top 1 * from dbo.col_test')

When running this syntax internally, I had the following findings:

  • Two distinct sessions are being created to execute this query.

select * from sys.dm_pdw_exec_sessions

where session_id in ('SID148612' , 'SID148613')

order by login_time desc

 

session_id status request_id security_id login_name login_time query_count is_transactional client_id app_name sql_spid
SID148613 Closed NULL NULL sa 1/21/16 7:29 6 0 172.18.177.109:1090 Microsoft SQL Server 183
SID148612 Closed NULL NULL sa 1/21/16 7:29 9 0 172.18.177.109:1089 Microsoft SQL Server 183

 

 

These two sessions are executing a total of 15 queries, even though only one was submitted.

 

I can see these two sessions are running the prepared statements as well as an explicit transaction.

 

select * from sys.dm_pdw_exec_requests

where session_id = 'SID148612'

order by submit_time desc

 

select * from sys.dm_pdw_exec_requests

where session_id = 'SID148613'

order by submit_time desc

 

request_id session_id status submit_time start_time end_compile_time end_time total_elapsed_time label error_id database_id command resource_class
QID1912364 SID148612 Completed 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 16 NULL NULL 31 SET NO_BROWSETABLE OFF NULL
QID1912362 SID148612 Completed 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 0 NULL NULL 31 exec [sp_unprepare] @P1 NULL
QID1912363 SID148612 Completed 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 0 NULL NULL 31 exec [sp_unprepare] @P1 NULL
QID1912359 SID148612 Completed 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 31 NULL NULL 31 exec [sp_prepare] @P1 OUT, @P2, @P3, @P4 NULL
QID1912360 SID148612 Completed 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 31 NULL NULL 31 exec [sp_prepare] @P1 OUT, @P2, @P3, @P4 NULL
QID1912361 SID148612 Completed 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 31 NULL NULL 31 ;select top 1 * from dbo.col_test NULL
QID1912358 SID148612 Completed 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 0 NULL NULL 31 SET NO_BROWSETABLE ON NULL
QID1912357 SID148612 Completed 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 0 NULL NULL 31 SELECT @@SPID NULL
QID1912356 SID148612 Completed 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 0 NULL NULL 31 USE [tim_sandbox] NULL
request_id session_id status submit_time start_time end_compile_time end_time total_elapsed_time label error_id database_id command resource_class
QID1912370 SID148613 Completed 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 125 NULL NULL 31 rollback NULL
QID1912369 SID148613 Completed 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 78 NULL NULL 31 ;select top 1 * from dbo.col_test smallrc
QID1912368 SID148613 Completed 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 0 NULL NULL 31 begin tran NULL
QID1912367 SID148613 Completed 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 0 NULL NULL 31 SET XACT_ABORT OFF NULL
QID1912366 SID148613 Completed 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 16 NULL NULL 31 SELECT @@SPID NULL
QID1912365 SID148613 Completed 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 1/21/16 7:29 0 NULL NULL 31 USE [tim_sandbox] NULL

 

We believe this is causing unnecessary overhead.  We know (in AU4 or earlier releases) that during times of high concurrency and a large amount of sessions are connecting, the CPU utilization will be increased on the control node and at some point new connections will begin to timeout.   This is currently a limitation of the PDW Engine process.  The limit this occurs varies with workload.  With the above symptoms, we are effectively doubling the amount of new session requests which will cause this limit to be hit with less intentional sessions than anticipated.  We are also doing a large amount of unnecessary work.  The total duration for both sessions is 328ms.

 

As a comparison, I executed the same query using the following syntax:

 

exec ('select top 1 * from dbo.col_test') at cssc8a

 

This appears to create a single session which issues 3 queries.

 

session_id status request_id security_id login_name login_time query_count is_transactional client_id app_name sql_spid
SID148614 Closed NULL NULL sa 1/21/2016 7:36:38.38 3 0 172.18.177.109:1380 Microsoft SQL Server 114

 

These three queries seem much more reasonable:

 

request_id session_id status submit_time start_time end_compile_time end_time total_elapsed_time label error_id database_id command resource_class
QID1912401 SID148614 Completed 1/21/2016 7:36:38 1/21/2016 7:36:38 1/21/2016 7:36:38 1/21/2016 7:36:38 31 NULL NULL 31 select top 1 * from dbo.col_test smallrc
QID1912400 SID148614 Completed 1/21/2016 7:36:38 1/21/2016 7:36:38 1/21/2016 7:36:38 1/21/2016 7:36:38 16 NULL NULL 31 SELECT @@SPID NULL
QID1912399 SID148614 Completed 1/21/2016 7:36:38 1/21/2016 7:36:38 1/21/2016 7:36:38 1/21/2016 7:36:38 0 NULL NULL 31 USE [tim_sandbox] NULL

 

 

 

Total execution time using openquery: 328ms

Total Execution time using exec:  47ms

 

 

Conclusion  

Using exec in lieu of openquery is far more efficient.  We see a reduction in the number of sessions created, queries issued to PDW, and total elapsed time when using exec; about a 700% improvement with a single query.