PolyBase Setup Errors and Possible Solutions

Blog Authors: Murshed Zaman and Sumin Mohanan

Reviewer(s): Barbara Kess

Prologue

PolyBase is a new feature in SQL Server 2016. It was popularized by APS (Microsoft Analytics Platform System) and Azure SQL DW. PolyBase allows access to relational and non-relational data from SQL Server using familiar T-SQL language. It allows you to run queries on external data that resides in Hadoop or Azure blob storage. Optionally, it can push query operations to Hadoop.  If you are interested in learning more about PolyBase, you can look at the PolyBase Guide on MSDN.

PolyBase setup is well documented on MSDN. But since the feature has connectivity to data that resides outside of SQL Server, like Hadoop, some of our customers have had situations for which it was necessary to involve engineering to understand and guide the situation to a successful installation and/or data access.

The purpose of this document is to highlight some of these issues we have seen with various customers and the possible solutions. Keep in mind these are some of the errors that we have seen and mitigated. The error message you see, although it may look the same, may have a different solution.  We are assuming that if you are using PolyBase with Hadoop, you know the basics of Hadoop and know how to look at various Hadoop logs.

Customer Scenario:

SQL Server 2016 or SQL DW connected to Azure blob storage. The CREATE EXTERNAL TABLE DDL points to a directory (and not a specific file) and the directory contains files with different schemas.

SSMS Error:

Select query on the external table gives the following error:

Msg 7320, Level 16, State 110, Line 14
Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)". Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 1 rows processed.
(/nation/sensors.ldjson.txt)Column ordinal: 0, Expected data type: INT, Offending value: {"id":"S2740036465E2B","time":"2016-02-26T16:59:02.9300000Z","temp":23.3,"hum":0.77,"wind":17,"press":1032,"loc":[-76.90914996169623,38.8929314364726]} (Column Conversion Error), Error: Error converting data type NVARCHAR to INT.

Keep in mind there may be derivations of this error. The name of the first rejected file shows in SSMS with offending data types or values.

Possible Reason:

The reason this error happens is because each file has different schema. The PolyBase external table DDL when pointed to a directory recursively reads all the files in that directory. When a column or data type mismatch happens, this error could be seen in SSMS.

Possible Solution:

If the data for each table consists of one file, then use the filename in the LOCATION section prepended by the directory of the external files. If there are multiple files per table, put each set of files into different directories in Azure Blob Storage and then you can point LOCATION to the directory instead of a particular file. The latter suggestion is the best practices recommended by SQLCAT even if you have one file per table.

Example:

Create External Table foo
(col1 int)WITH (LOCATION='/bar/foobar.txt',DATA_SOURCE…); OR
Create External Table foo
(col1 int) WITH (LOCATION = '/bar/', DATA_SOURCE…);

Customer Scenario:

SQL Server 2016 is setup to access a supported Hadoop Cluster. Kerberos security is not enforced in Hadoop Cluster.

SSMS Error:

Select on the external table gives the following error:

Msg 105019, Level 16, State 1, Line 55
EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect: Error [Unable to instantiate LoginClass] occurred while accessing external file.'
Msg 7320, Level 16, State 110, Line 55
Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)". EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect: Error [Unable to instantiate LoginClass] occurred while accessing external file.'

DW Engine Server Log:

Interrogation of DWEngine Server log shows the following error:
Note: DWEngine_server.log is usually located under Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Log\Polybase\

3/15/2016 5:29:01 PM [Thread:16432] [EngineInstrumentation:EngineQueryErrorEvent] (Error, High):
EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect: Error [com.microsoft.polybase.client.KerberosSecureLogin] occurred while accessing external file.'
Microsoft.SqlServer.DataWarehouse.Common.ErrorHandling.MppSqlException: EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect: Error [com.microsoft.polybase.client.KerberosSecureLogin] occurred while accessing external file.' ---> Microsoft.SqlServer.DataWarehouse.DataMovement.Common.ExternalAccess.HdfsAccessException: Java exception raised on call to HdfsBridge_Connect: Error [com.microsoft.polybase.client.KerberosSecureLogin] occurred while accessing external file.
at Microsoft.SqlServer.DataWarehouse.DataMovement.Common.ExternalAccess.ExternalHadoopBridge.OpenBridge()
at Microsoft.SqlServer.DataWarehouse.DataMovement.Common.ExternalAccess.HdfsBridgeFileAccess.GetFileMetadata(String filePath)
at Microsoft.SqlServer.DataWarehouse.Sql.Statements.HadoopFile.ValidateFile(ExternalFileState fileState)

Possible Reason:

Kerberos is not enabled in Hadoop Cluster, but Kerberos security is enabled in core-site.xml, yarn-site.xml, or the hdfs-site.xml that resides in SQL Server 2016 (usually located under Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Binn\Polybase\Hadoop\conf).

Possible Solution:

Comment out the Kerberos security information from the above mentioned files in the SQL Server 2016 instance.

Customer Scenario:

SQL Server 2016 is setup to access a supported Hadoop Cluster or Azure Blob Storage.

SSMS Error:

Select on an external table shows the following error:

Msg 8680, Level 17, State 5, Line 118
Internal Query Processor Error: The query processor encountered an unexpected error during the processing of a remote query phase.

DWEngine Server Log:

[Thread:5216] [ControlNodeMessenger:ErrorEvent] (Error, High): ***** DMS System has disconnected nodes :
[Thread:5216] [ControlNodeMessenger:ErrorEvent] (Error, High): ***** DMS System has disconnected nodes :
[Thread:5216] [ControlNodeMessenger:ErrorEvent] (Error, High): ***** DMS System has disconnected nodes :

Possible Reason:

The reason for this error could be that SQL Server was not restarted after configuring PolyBase.

Possible Solution:

Restart SQL Server. Check DWEngine Server Log to make sure you don't see DMS disconnections after the restart.

Customer Scenario:

SQL Server 2016 is connected to an unsecured Hadoop cluster (Kerberos is not enabled). PolyBase is configured to push computation to Hadoop cluster.

Query: select count(*) from foo WITH (FORCE EXTERNALPUSHDOWN);

SSMS Error:

Msg 105019, Level 16, State 1, Line 1
EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to JobSubmitter_PollJobStatus: Error [java.net.ConnectException: Call From big1506sql2016/172.16.1.4 to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: Connection refused: no further information; For more details see:  https://wiki.apache.org/hadoop/ConnectionRefused] occurred while accessing external file.'
OLE DB provider "SQLNCLI11" for linked server "(null)" returned message "Unspecified error".
Msg 7421, Level 16, State 2, Line 1
Cannot fetch the rowset from OLE DB provider "SQLNCLI11" for linked server "(null)". .

Hadoop Yarn Log Error:

Job setup failed : org.apache.hadoop.security.AccessControlException: Permission denied: user=pdw_user, access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:232) org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:176) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5525)

Possible Reason:

With Kerberos disabled, PolyBase will use pdw_user as the user for accessing HDFS and submitting MapReduce jobs.

Possible Solution:

Create pdw_user on Hadoop and give it sufficient permissions to the directories used during mapreduce processing. Also make sure that pdw_user is the owner of the /user/pdw_user HDFS directory.

Below is an example of how to create home directory and assign permissions for pdw_user:

sudo -u hdfs hadoop fs -mkdir /user/pdw_user
sudo -u hdfs hadoop fs -chown pdw_user /user/pdw_user

After this make sure that pdw_user has read, write, and execute permissions on /user/pdw_user directory and /tmp directory has 777 permissions.

Customer Scenario:

SQL Server 2016 PolyBase is setup with Hadoop Cluster or Azure Blob Storage.

SSMS Error:

Any Select query fails with the following error.
Msg 106000, Level 16, State 1, Line 1
Java heap space

Possible Reason:

Illegal input may cause the java out of memory error.  In this particular case the file was not in UTF8 format. DMS tries to read the whole file as one row since it cannot decode the row delimiter and runs into Java heap space error.

Possible Solution:

Convert the file to UTF8 format since PolyBase currently requires UTF8 format for text delimited files.

Customer Scenario:

Customer's intention is to setup SQL Server 2016 PolyBase to connect to Azure Blob Storage.

SSMS Error:

Msg 105019, Level 16, State 1, Line 74
EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect: Error [No FileSystem for scheme: wasbs] occurred while accessing external file.'

Possible Reason:

Hadoop connectivity is not set to the configuration value for accessing Azure Blob storage.

Possible Solution:

Set the Hadoop connectivity to a value (preferably 7) which supports Azure Blob Storage and restart SQL Server. List of connectivity values and supported types can be found at https://msdn.microsoft.com/en-us/library/mt143174.aspx.

Customer Scenario:

Trying to export data to Azure blob storage or Hadoop file system using PolyBase with CREATE EXTERNAL TABLE AS SELECT (CETAS) syntax from SQL Server 2016.

SSMS Error:

Msg 156, Level 15, State 1, Line 177
Incorrect syntax near the keyword 'WITH'.
Msg 319, Level 15, State 1, Line 177
Incorrect syntax near the keyword 'with'. If this statement is a common table expression, an xmlnamespaces clause or a change tracking context clause, the previous statement must be terminated with a semicolon.

Reason:

CETAS is not a supported statement in SQL Server 2016 for PolyBase.

Solution:

Create the external table first and then use INSERT INTO SELECT to export to the external location. More details can be found at https://msdn.microsoft.com/en-us/library/mt652313.aspx

Customer Scenario:

SQL DW is setup to import data from Azure blob storage. Create external table fails with the following message.

SSMS Error:

Msg 105019, Level 16, State 1, Line 34
External TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_IsDirExist. Java exception message:com.microsoft.azure.storage.StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.: Error [com.microsoft.azure.storage.StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.] occurred while accessing external file.'

Possible Reason:

Wrong Azure storage key was used to create the database scoped credential.

Possible Solution:

Drop all related objects (i.e: data source, file format) and then drop and recreate the database scoped credential with the right storage key.

Customer Scenario:

SQL Server 2016 is setup with Kerberos enabled Cloudera Cluster. SQL Server has been restarted after all the configuration changes. PolyBase Engine and PolyBase Data Movement services are running after restart.

SSMS Error:

Data Source configured without job tracker location:  org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.viewfs.ViewFileSystem could not be instantiated

Data Source configured with job tracker location:  Error [Can't get Kerberos realm] occurred while accessing external file

Possible Reason:

The value for the "hadoop.security.authentication" property says kerberos in the Coresite.xml.

Possible Solution:

Coresite.xml's "hadoop.security.authentication" property should be KERBEROS (all upper case) as the value. 

Customer Scenario:

SQL Server 2016 or APS is setup with supported HDP Cluster. Queries without pushdown works but fails with the following message when 'force pushdown' hint is used.

SSMS Error:

Msg 7320, Level 16, State 110, Line 35
Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)". EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to JobSubmitter_PollJobStatus: Error [org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getTaskAttemptCompletionEvents(HistoryClientService.java:277)
at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBServiceImpl.java:173)
at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:283)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121)
] occurred while accessing external file.'

Possible Reason:

Mapred-site.xml is missing some needed values that checks for intermediate and end results.

Possible Solution:

Add the following properties and associate the correct values as it shows on Ambari in the mapred-site.xml file on SQL Server. 

<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/mr-history/done</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/mr-history/tmp</value>
</property>

Customer Scenario:

SQL Server 2016 is setup to access a supported Hadoop Cluster.

SSMS Error:

Creating an external table gives the following error:

Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)". 110802;An internal DMS error occurred that caused this operation to fail. Details: Exception: Microsoft.SqlServer.DataWarehouse.DataMovement.Workers.DmsSqlNativeException, Message: SqlNativeBufferReader.Run, error in OdbcExecuteQuery: SqlState: 42000, NativeError: 8680, 'Error calling: SQLExecDirect(this->GetHstmt(), (SQLWCHAR *)statementText, SQL_NTS), SQL return code: -1 | SQL Error Info: SrvrMsgState: 26, SrvrSeverity: 17,  Error <1>: ErrorMsg: [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Internal Query Processor Error: The query processor encountered an unexpected error during the processing of a remote query phase. | Error calling: pReadConn->ExecuteQuery(statementText, bufferFormat) | state: FFFF, number: 24, active connections: 8', Connection String: Driver={pdwodbc};APP=RCSmall-DmsNativeReader:WAD1D16HD2001\mpdwsvc (3600)-ODBC-PoolId1433;Trusted_Connection=yes;AutoTranslate=no;Server=\\.\pipe\sql\query

          OR

Selecting from an external table gives the following error:

8/4/2016 8:19:40 PM [Thread:30544] [AbstractReaderWorker:ErrorEvent] (Error, High): QueryId QID2433 PlanId 6c3a4551-e54c-4c06-a5ed-a8733edac691 StepId 7:
Could not obtain block: BP-1726738607-192.168.225.121-1443123675290:blk_1159687047_86196509 file=/user/hive/warehouse/u_data/000000_0
Microsoft.SqlServer.DataWarehouse.Common.ErrorHandling.MppSqlException: Could not obtain block: BP-1726738607-192.168.225.121-1443123675290:blk_1159687047_86196509 file=/user/hive/warehouse/u_data/000000_0
at Microsoft.SqlServer.DataWarehouse.DataMovement.Common.ExternalAccess.HdfsBridgeReadAccess.Read(MemoryBuffer buffer, Boolean& isDone)
at Microsoft.SqlServer.DataWarehouse.DataMovement.Workers.DataReader.ExternalMoveBufferReader.Read()
at Microsoft.SqlServer.DataWarehouse.DataMovement.Workers.ExternalMoveReaderWorker.ReadAndSendData()
at Microsoft.SqlServer.DataWarehouse.DataMovement.Workers.ExternalMoveReaderWorker.Execute(Object status)

Possible Reason:

This error message can show up when the Hadoop Cluster is setup in a configuration where data nodes are only accessible outside the cluster using the Hostname and not the IP address.

Possible Solution:

Add the following to hdfs-site.xml file on the client (SQL Server) side. This configuration will force the name node to return a URI for the data nodes with the Hostname instead of the internal IP address.

<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>

Customer Scenario:

SQL Server 2016 is running a PolyBase query on a directory with a large number of files (>30,000 files under the directory path recursively).

SSMS Error:

Msg 105019, Level 16, State 1, Line 1
EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_GetFileNameByIndex. Java exception message:
GC overhead limit exceeded: Error [GC overhead limit exceeded] occurred while accessing external file.'

          AND/OR

Msg 105019, Level 16, State 1, Line 1
EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_GetDirectoryFiles. Java exception message:
Java heap space: Error [Java heap space] occurred while accessing external file.'

Possible Reason:

When processing a path, PolyBase enumerates all files under that path, and there is fixed memory overhead associated with the data structure that is used to represent the files. With a large number of files, this overhead becomes noticeable and can eventually consume all memory available to the JVM.

Possible Solution:

Rearrange the data in multiple directories so that each directory contains a subset of files and then break down the query in multiple ones that operate on a part of the original path at a time and materialize the tables as SQL Server tables (before joining them).

Example: Let’s assume your external table data is in the following location:
Orders/file1.txt,...,file30K.txt.

  1. Change the layout so the data is laid out in a conventional file partition structure in Orders/yyyy/mm/dd/file1.txt (where yyyy/mm/dd are actual year/month/days).
  2. Point your external table to a lower directory path like month(mm) or day(dd) and import the files into SQL Server 2016 tables in pieces and then add them as part of one table.

Even if you had the right directory structure to begin with, follow step #2 to be able to work with that many files without running out of JVM memory.

Customer Scenario:

Setting up SQL Server 2016 or APS with a Hadoop cluster which involves modifying yarn-site.xml, hdfs-site.xml and other configuration files. We have seen this with 2 customers now and felt it was worth mentioning here.

SSMS Error:

Msg 105019, Level 16, State 1, Line 1

Microsoft.SqlServer.DataWarehouse.Common.ErrorHandling.MppSqlException: EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect. Java exception message:com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.: Error [com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.] occurred while accessing external file.' --->

Possible Reason:

This can happen if you have copied and pasted some lines of instruction from a website or Skype chat window. These sources that has it's own formatting and can produce unwanted/unprintable characters which when found in the configuration files will show error message as shown above.

Possible Solution:

Open the files in a smart text editor (something other than notepad) and look for these characters and eliminate them. Restart the necessary services.

Epilogue

This blog does not cover all the issues you can encounter when accessing data by using PolyBase. We plan to revise this document regularly with additional scenarios for the benefit of the masses.