APS AU4 Feature Review: 1.5x data return rate improvement

The Microsoft engineering team has officially signed off on the latest APS update, Appliance Update 4. In this post we will choose one of the features from AU4 and discuss in more detail what the improvement is and what it means for PDW clients. One such improvement that has raised the most question is labeled as "1.5x data return rate improvement for SELECT * queries improves external analytics (SAS, R) integration", and seems like a good place to start.

What does this improvement actually mean? As the improvement states, this improvement is mostly targeting analytics where large data sets will be streamed from the appliance into another application. Common examples of these applications are SSAS, SAS and R. Previous performance testing had shown that PDW appliances were measurably slower in returning data to a client than SMP SQL. Further investigation had shown that we were experiencing a bottleneck on the control node. Even though we make every attempt to not land the data on the control instance, we still must stream the data through the CTL node when returning to a client. In previous versions there was a single thread both receiving the results from the compute node SQL instances and then sending the results to the client. Considering it is a single thread, it could only do one operation at a time, either receive or send data. This improvement introduces an additional thread to the process. There is now one thread constantly receiving the data streams from the compute nodes and a separate thread simultaneously sending those results to the client. 

What does this mean for my application? With this change, a single select statement which does not perform any aggregations or ordering on the control node will benefit from a 1.5x improvement in performance for the return operation. This improvement will be realized in concurrent sessions as well until other limitations such as IO or network bandwidth begin to limit the total throughput. If you are performing many concurrent return operations on a consistent basis, you may be able to realize even more improvement for your application by exploring a supported Infiniband connection between the application server and the appliance. 

Are there any limitations to this improvement? Like all performance improvements, by removing one bottleneck you will uncover another. By streaming data faster, you will want to take into consideration network bandwidth as well as the performance of the application receiving the rows. The improvement also is not applicable to all scenarios. If data must be aggregated on the control node for further aggregation, this improvement will not be activated. You can easily tell if this is the case by reviewing an explain plan for your query and see if the source table for the return operation exist on the control or compute nodes. If the location is the compute node, you will benefit from the multi-threaded operation.