Exporting to a single file in SQL Data Warehouse

Azure SQL Data Warehouse includes PolyBase - a technology that accesses and combines both non-relational and relational data from T-SQL . It allows you to run queries on external data in Azure blob storage and can be used for importing or exporting data. External data is exposed through a concept called an external table (see MSDN for details on setting up an external table) - essentially a normal SQL table that is backed by data stored outside of the relational engine.

SQL Data Warehouse is a multi-node scale out relational SQL Server data warehouse. PolyBase has the capability of using all of the nodes in the SQL Data Warehouse cluster to import or export data in parallel. This is great for performance reasons (multiple nodes working in parallel can produce very large ingestion rates) but can cause some confusion if you're not aware of this design.

Let's take a look at an example.

Exporting Data Default

When exporting data, you'll use a T-SQL command to create an external table from a SELECT statement. We call this a CETAS (Create External Table As Select). The processing of moving data from one table to another via CETAS will invoke PolyBase to create an external storage mechanism for the data to land into. In this example, a set of files on Azure Blob Storage.

 CREATE EXTERNAL TABLE dbo.Orders
WITH
(
    LOCATION = '/Orders/',
    DATA_SOURCE = AzureStorage,
    FILE_FORMAT = TextFile
)
AS
SELECT
*
FROM
dbo.OrderHistory;

This will produce 60 text files in Azure Blob Storage. This by design as PolyBase as the capability of reading multiple files in a directory or directory tree as the backing data for an External Table (a handy feature we’ll blog about in the future).

Exporting a Single File

But wait, what if I want to export all of my data to a single file? There is a handy trick that you can use – the TOP clause. Using the TOP clause forces the data to the control node in the service and the output will be sent to a single file:

 CREATE EXTERNAL TABLE dbo.Orders
WITH
(
    LOCATION = '/Orders/',
    DATA_SOURCE = AzureStorage,
    FILE_FORMAT = TextFile
)
AS
SELECT TOP [N] 
*
FROM
dbo.OrderHistory;

Note: You’ll need to replace the [N] value with the row count of the table (if you’re doing a full export).

After executing the CETAS with the TOP clause, you’ll be left with a single file in Azure Blob Storage.

Next Steps

Visit the SQL Data Warehouse Overview to learn more about Microsoft’s scale out relational data warehouse.