Ask Learn
Preview
Ask Learn is an AI assistant that can answer questions, clarify concepts, and define terms using trusted Microsoft documentation.
Please sign in to use Ask Learn.
Sign inThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
I had a fantastic day (Well, night in my timezone!) at Big Data and Advanced Analytics Virtual Workshop - Day 1. You can download the slides and video recording from here .
I did 2 demos in the "Session 5: Using Azure SQL Data Warehouse".
First demo was to set up a Azure SQL Data warehouse, explain the pause and resume feature , scaling Data warehouse and pushing some data into it using bcp. I used following scripts and this file.
Create table in SQL DW
CREATE TABLE DimDate
(
DateId INT NOT NULL,
CalendarQuarter TINYINT NOT NULL,
FiscalQuarter TINYINT NOT NULL
)
WITH
(
CLUSTERED COLUMNSTORE INDEX,
DISTRIBUTION = ROUND_ROBIN
);
Run the bcp utility
bcp DimDate in <your directory>\DimDate2.txt -S <your DW Server name> -d <your DW Name>-U <your DW Username> -q -c -t ','
Run SELECT query
SELECT * FROM DimDate
Second demo was more interesting. I was to demo joining tables pointing at 2 other data sources. 1st being the Azure Blob and 2nd being the ORC table in HDInsight (It uses Azure Blob underlying it!).
For Azure Blob-as-a-data source part, I created a container in my blob account. I moved this file to it. Back in SQL Server Management Studio (SSMS), I ran following queries.
Create a data source for Azure Blob
CREATE EXTERNAL DATA SOURCE AzureStorage
WITH (
TYPE = HADOOP,
LOCATION = 'wasbs://<yourcontainer>@<your-storage-account-path>',
CREDENTIAL = AzureStorageCredential
);
Credential 'AzureStorageCredential' was pre-created by running following queries.
CREATE MASTER KEY;
CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential
WITH
IDENTITY = 'user',
SECRET = '<your storage key>';
Create external table
CREATE EXTERNAL TABLE [asb].DimProduct
(
[ProductID] [int] NOT NULL,
[ProductName] [nvarchar](255) NULL,
[ProductDescription] [nvarchar](500) NULL
)
WITH
(
LOCATION='/Products.txt/' ,
DATA_SOURCE = AzureStorage,
FILE_FORMAT = PipeDeliemitedTextFileFormat,
REJECT_TYPE = VALUE,
REJECT_VALUE = 0
)
The schema [asb] and FILE_FORMAT PipeDeliemitedTextFileFormat werepre-created using following queries.
CREATE SCHEMA [asb]
CREATE EXTERNAL FILE FORMAT PipeDeliemitedTextFileFormat
WITH
( FORMAT_TYPE = DELIMITEDTEXT
, FORMAT_OPTIONS ( FIELD_TERMINATOR = '|'
, STRING_DELIMITER = ''
, DATE_FORMAT = 'yyyy-MM-dd HH:mm:ss.fff'
, USE_TYPE_DEFAULT = FALSE
)
);
Show data from external table
SELECT * FROM [asb].DimProduct
This explained how data residing in Azure blob can be queried straight from the SQL Data warehouse.
For the HDInsight part, I needed to create a HDInsight cluster and set it up. I ran following queries in HIVE Editor of HDI.
Create an external table in HDI
CREATE EXTERNAL TABLE IF NOT EXISTS <your HDI database>.salestxt
(
salesdate int,
productid int,
salesamt float
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n' STORED AS TEXTFILE;
Database was pre-created using script.
CREATE DATABASE IF NOT EXISTS <your HDI database>;
Load data into external table
LOAD DATA INPATH 'wasb:///Sales.txt' INTO TABLE <your HDI database>.salestxt;
I had created this file as Sales.txt. For copying to work, I had to keep it at the root location of HDI container in my storage account.
Create ORC table
CREATE TABLE IF NOT EXISTS <your HDI database>.salesorc
(
salesdate int,
productid int,
salesamt float
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS ORC;
Load data into ORC table
INSERT OVERWRITE TABLE <your HDI database>.salesorc SELECT * FROM <your HDI database>.salestxt;
Once the HDI set up was done, I set up SQL Data warehouse to connect to it. Following queries were used to set it up.
Create data source for HDInsight
CREATE EXTERNAL DATA SOURCE ORCAzureStorage
WITH
(
TYPE = HADOOP,
LOCATION = 'wasbs://<your HDI storage container>@<your storage account path>',
CREDENTIAL = AzureStorageCredential
);
Credential 'AzureStorageCredential' was re-used from the Azure Blob Data Source.
Create external table
CREATE EXTERNAL TABLE [hdp].FactSales (
salesdate int,
productid int,
salesamt float
)
WITH (
LOCATION ='/hive/warehouse/<your HDI database>/salestxt',
DATA_SOURCE = ORCAzureStorage ,
FILE_FORMAT = PipeDeliemitedTextFileFormat
);
FILE_FORMAT 'PipeDeliemitedTextFileFormat' was re-used from the Azure Blob file format. Schema [hdp] was created using query CREATE SCHEMA [hdp]
Show data from external table
SELECT * FROM [hdp].FactSales
I then showed a query output obtained after joining tables pointing to 2 data sources i.e. Azure Blob and HDInsight, with SQL Data warehouse
SELECT * FROM
[dbo].DimDate a,
[asb].DimProduct b,
[hdp].FactSales c
WHERE
a.dateid = c.salesdate AND
b.productid = c.productid
2nd day of the event also promises to be an exciting day with following sessions -
If you have not registered already, register here. See you there!
Ask Learn is an AI assistant that can answer questions, clarify concepts, and define terms using trusted Microsoft documentation.
Please sign in to use Ask Learn.
Sign in