APS 2016 External Object Changes

APS customers who use Polybase and will be soon upgrading to APS 2016 from AU5 or prior should consider this change and take action prior to upgrading.

APS 2016 upgrades the underlying SQL instances to SQL 2016. Along with that comes Polybase that was added to SQL 2016. Polybase is a great feature to see added to SQL Server, but there are some fundamental changes that were made. In my previous blog post about Polybase for SQL Server, the external objects that you create are all at the database level. With the previous releases of Polyabse that were only in APS, the External Data Sources and External File Formats were instance level objects which means they are stored in the master database. This is easily identifiable in SSDT by browsing to the objects:

When you upgrade to APS 2016, the resources are not moved. This means the external data sources and file formats still exist in the master database for backwards compatibility. The reason is that these may be used by several external tables and/or referenced in stored procedures, SQL commands stored in SSIS packages, etc. There really is no good way to determine which database(s) should house these objects programmatically without risk of breaking any processes. Luckily, after you upgrade, the external tables still work just fine referencing the external data sources and file formats in master.

There is a problem, however. If you have any scripts that reference catalog views to query the definition of external tables, you need to join to the local sys.external_data_sources and sys.external_file_formats for newer objects but join to the system views in the master database for the pre-APS2016 objects. This is an inconsistency that is not desired. Currently, there is a bug in SQL Server Data Tools that returns an error when you try to script out a backwards compatible external table. This means that if you upgrade to APS2016, your external tables will work, but you can't script them with SSDT (right click>>view code)! This is due to the difference in catalog views. SSDT joins to the local sys.external_data_sources and sys.external_file_formats.

This query will show you metadata for external tables referencing local data sources and file fomats:

[sql]
SELECT
col.name AS column_name,
col.column_id,
typ.user_type_id AS column_type,
col.max_length,
col.precision,
col.scale,
col.collation_name,
col.is_nullable,
ext.location,
ds.name,
ff.name,
ext.reject_type,
ext.reject_sample_value,
ext.reject_value
FROM sys.all_columns col
JOIN sys.types typ ON typ.system_type_id = col.system_type_id
JOIN sys.external_tables ext ON ext.object_id = col.object_id
JOIN sys.external_data_sources ds ON ext.data_source_id = ds.data_source_id
JOIN sys.external_file_formats ff ON ext.file_format_id = ff.file_format_id
ORDER BY column_id;
[/sql]

This query will show you metadata for external tables referencing data sources and file fomats in master:

[sql]
SELECT
col.name AS column_name,
col.column_id,
typ.user_type_id AS column_type,
col.max_length,
col.precision,
col.scale,
col.collation_name,
col.is_nullable,
ext.location,
ds.name,
ff.name,
ext.reject_type,
ext.reject_sample_value,
ext.reject_value
FROM sys.all_columns col
JOIN sys.types typ ON typ.system_type_id = col.system_type_id
JOIN sys.external_tables ext ON ext.object_id = col.object_id
JOIN master.sys.external_data_sources ds ON ext.data_source_id = ds.data_source_id
JOIN master.sys.external_file_formats ff ON ext.file_format_id = ff.file_format_id
ORDER BY column_id;
[/sql]

CALL TO ACTION:

When planning your upgrade to APS2016, the best practice is to identify all your external data sources and file formats and to which database(s) they belong and then follow these steps:

  1. Before upgrading, script out all external objects (data sources, file formats, and external tables).
  2. Drop these objects during your outage window just before starting the upgrade to APS2016
  3. After the upgrade, recreate all the objects in the appropriate database. Note that it is possible that some external objects may be in multiple databases.

Following this best practice will align with SQL Server Polybase as well as prevent you from hitting the SSDT bug.