Ask Learn
Preview
Please sign in to use this experience.
Sign inThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Blog Authors: Murshed Zaman and Sumin Mohanan
Reviewer(s): Barbara Kess
PolyBase is a new feature in SQL Server 2016. It was popularized by APS (Microsoft Analytics Platform System) and Azure SQL DW. PolyBase allows access to relational and non-relational data from SQL Server using familiar T-SQL language. It allows you to run queries on external data that resides in Hadoop or Azure blob storage. Optionally, it can push query operations to Hadoop. If you are interested in learning more about PolyBase, you can look at the PolyBase Guide on MSDN.
PolyBase setup is well documented on MSDN. But since the feature has connectivity to data that resides outside of SQL Server, like Hadoop, some of our customers have had situations for which it was necessary to involve engineering to understand and guide the situation to a successful installation and/or data access.
The purpose of this document is to highlight some of these issues we have seen with various customers and the possible solutions. Keep in mind these are some of the errors that we have seen and mitigated. The error message you see, although it may look the same, may have a different solution. We are assuming that if you are using PolyBase with Hadoop, you know the basics of Hadoop and know how to look at various Hadoop logs.
SQL Server 2016 or SQL DW connected to Azure blob storage. The CREATE EXTERNAL TABLE DDL points to a directory (and not a specific file) and the directory contains files with different schemas.
Select query on the external table gives the following error:
Msg 7320, Level 16, State 110, Line 14
Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)". Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 1 rows processed.
(/nation/sensors.ldjson.txt)Column ordinal: 0, Expected data type: INT, Offending value: {"id":"S2740036465E2B","time":"2016-02-26T16:59:02.9300000Z","temp":23.3,"hum":0.77,"wind":17,"press":1032,"loc":[-76.90914996169623,38.8929314364726]} (Column Conversion Error), Error: Error converting data type NVARCHAR to INT.
Keep in mind there may be derivations of this error. The name of the first rejected file shows in SSMS with offending data types or values.
The reason this error happens is because each file has different schema. The PolyBase external table DDL when pointed to a directory recursively reads all the files in that directory. When a column or data type mismatch happens, this error could be seen in SSMS.
If the data for each table consists of one file, then use the filename in the LOCATION section prepended by the directory of the external files. If there are multiple files per table, put each set of files into different directories in Azure Blob Storage and then you can point LOCATION to the directory instead of a particular file. The latter suggestion is the best practices recommended by SQLCAT even if you have one file per table.
Example:
Create External Table foo
(col1 int)WITH (LOCATION='/bar/foobar.txt',DATA_SOURCE…); OR
Create External Table foo
(col1 int) WITH (LOCATION = '/bar/', DATA_SOURCE…);
SQL Server 2016 is setup to access a supported Hadoop Cluster. Kerberos security is not enforced in Hadoop Cluster.
Select on the external table gives the following error:
Msg 105019, Level 16, State 1, Line 55
EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect: Error [Unable to instantiate LoginClass] occurred while accessing external file.'
Msg 7320, Level 16, State 110, Line 55
Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)". EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect: Error [Unable to instantiate LoginClass] occurred while accessing external file.'
Interrogation of DWEngine Server log shows the following error:
Note: DWEngine_server.log is usually located under Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Log\Polybase\
3/15/2016 5:29:01 PM [Thread:16432] [EngineInstrumentation:EngineQueryErrorEvent] (Error, High):
EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect: Error [com.microsoft.polybase.client.KerberosSecureLogin] occurred while accessing external file.'
Microsoft.SqlServer.DataWarehouse.Common.ErrorHandling.MppSqlException: EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect: Error [com.microsoft.polybase.client.KerberosSecureLogin] occurred while accessing external file.' ---> Microsoft.SqlServer.DataWarehouse.DataMovement.Common.ExternalAccess.HdfsAccessException: Java exception raised on call to HdfsBridge_Connect: Error [com.microsoft.polybase.client.KerberosSecureLogin] occurred while accessing external file.
at Microsoft.SqlServer.DataWarehouse.DataMovement.Common.ExternalAccess.ExternalHadoopBridge.OpenBridge()
at Microsoft.SqlServer.DataWarehouse.DataMovement.Common.ExternalAccess.HdfsBridgeFileAccess.GetFileMetadata(String filePath)
at Microsoft.SqlServer.DataWarehouse.Sql.Statements.HadoopFile.ValidateFile(ExternalFileState fileState)
Kerberos is not enabled in Hadoop Cluster, but Kerberos security is enabled in core-site.xml, yarn-site.xml, or the hdfs-site.xml that resides in SQL Server 2016 (usually located under Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Binn\Polybase\Hadoop\conf).
Comment out the Kerberos security information from the above mentioned files in the SQL Server 2016 instance.
SQL Server 2016 is setup to access a supported Hadoop Cluster or Azure Blob Storage.
Select on an external table shows the following error:
Msg 8680, Level 17, State 5, Line 118
Internal Query Processor Error: The query processor encountered an unexpected error during the processing of a remote query phase.
[Thread:5216] [ControlNodeMessenger:ErrorEvent] (Error, High): ***** DMS System has disconnected nodes :
[Thread:5216] [ControlNodeMessenger:ErrorEvent] (Error, High): ***** DMS System has disconnected nodes :
[Thread:5216] [ControlNodeMessenger:ErrorEvent] (Error, High): ***** DMS System has disconnected nodes :
The reason for this error could be that SQL Server was not restarted after configuring PolyBase.
Restart SQL Server. Check DWEngine Server Log to make sure you don't see DMS disconnections after the restart.
SQL Server 2016 is connected to an unsecured Hadoop cluster (Kerberos is not enabled). PolyBase is configured to push computation to Hadoop cluster.
Query: select count(*) from foo WITH (FORCE EXTERNALPUSHDOWN);
Msg 105019, Level 16, State 1, Line 1
EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to JobSubmitter_PollJobStatus: Error [java.net.ConnectException: Call From big1506sql2016/172.16.1.4 to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: Connection refused: no further information; For more details see: https://wiki.apache.org/hadoop/ConnectionRefused] occurred while accessing external file.'
OLE DB provider "SQLNCLI11" for linked server "(null)" returned message "Unspecified error".
Msg 7421, Level 16, State 2, Line 1
Cannot fetch the rowset from OLE DB provider "SQLNCLI11" for linked server "(null)". .
Job setup failed : org.apache.hadoop.security.AccessControlException: Permission denied: user=pdw_user, access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:232) org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:176) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5525)
With Kerberos disabled, PolyBase will use pdw_user as the user for accessing HDFS and submitting MapReduce jobs.
Create pdw_user on Hadoop and give it sufficient permissions to the directories used during mapreduce processing. Also make sure that pdw_user is the owner of the /user/pdw_user HDFS directory.
Below is an example of how to create home directory and assign permissions for pdw_user:
sudo -u hdfs hadoop fs -mkdir /user/pdw_user
sudo -u hdfs hadoop fs -chown pdw_user /user/pdw_user
After this make sure that pdw_user has read, write, and execute permissions on /user/pdw_user directory and /tmp directory has 777 permissions.
SQL Server 2016 PolyBase is setup with Hadoop Cluster or Azure Blob Storage.
Any Select query fails with the following error.
Msg 106000, Level 16, State 1, Line 1
Java heap space
Illegal input may cause the java out of memory error. In this particular case the file was not in UTF8 format. DMS tries to read the whole file as one row since it cannot decode the row delimiter and runs into Java heap space error.
Convert the file to UTF8 format since PolyBase currently requires UTF8 format for text delimited files.
Customer's intention is to setup SQL Server 2016 PolyBase to connect to Azure Blob Storage.
Msg 105019, Level 16, State 1, Line 74
EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect: Error [No FileSystem for scheme: wasbs] occurred while accessing external file.'
Hadoop connectivity is not set to the configuration value for accessing Azure Blob storage.
Set the Hadoop connectivity to a value (preferably 7) which supports Azure Blob Storage and restart SQL Server. List of connectivity values and supported types can be found at https://msdn.microsoft.com/en-us/library/mt143174.aspx.
Trying to export data to Azure blob storage or Hadoop file system using PolyBase with CREATE EXTERNAL TABLE AS SELECT (CETAS) syntax from SQL Server 2016.
Msg 156, Level 15, State 1, Line 177
Incorrect syntax near the keyword 'WITH'.
Msg 319, Level 15, State 1, Line 177
Incorrect syntax near the keyword 'with'. If this statement is a common table expression, an xmlnamespaces clause or a change tracking context clause, the previous statement must be terminated with a semicolon.
CETAS is not a supported statement in SQL Server 2016 for PolyBase.
Create the external table first and then use INSERT INTO SELECT to export to the external location. More details can be found at https://msdn.microsoft.com/en-us/library/mt652313.aspx
SQL DW is setup to import data from Azure blob storage. Create external table fails with the following message.
Msg 105019, Level 16, State 1, Line 34
External TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_IsDirExist. Java exception message:com.microsoft.azure.storage.StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.: Error [com.microsoft.azure.storage.StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.] occurred while accessing external file.'
Wrong Azure storage key was used to create the database scoped credential.
Drop all related objects (i.e: data source, file format) and then drop and recreate the database scoped credential with the right storage key.
SQL Server 2016 is setup with Kerberos enabled Cloudera Cluster. SQL Server has been restarted after all the configuration changes. PolyBase Engine and PolyBase Data Movement services are running after restart.
Data Source configured without job tracker location: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.viewfs.ViewFileSystem could not be instantiated
Data Source configured with job tracker location: Error [Can't get Kerberos realm] occurred while accessing external file
The value for the "hadoop.security.authentication" property says kerberos in the Coresite.xml.
Coresite.xml's "hadoop.security.authentication" property should be KERBEROS (all upper case) as the value.
SQL Server 2016 or APS is setup with supported HDP Cluster. Queries without pushdown works but fails with the following message when 'force pushdown' hint is used.
Msg 7320, Level 16, State 110, Line 35
Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)". EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to JobSubmitter_PollJobStatus: Error [org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getTaskAttemptCompletionEvents(HistoryClientService.java:277)
at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBServiceImpl.java:173)
at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:283)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121)
] occurred while accessing external file.'
Mapred-site.xml is missing some needed values that checks for intermediate and end results.
Add the following properties and associate the correct values as it shows on Ambari in the mapred-site.xml file on SQL Server.
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/mr-history/done</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/mr-history/tmp</value>
</property>
SQL Server 2016 is setup to access a supported Hadoop Cluster.
Creating an external table gives the following error:
Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)". 110802;An internal DMS error occurred that caused this operation to fail. Details: Exception: Microsoft.SqlServer.DataWarehouse.DataMovement.Workers.DmsSqlNativeException, Message: SqlNativeBufferReader.Run, error in OdbcExecuteQuery: SqlState: 42000, NativeError: 8680, 'Error calling: SQLExecDirect(this->GetHstmt(), (SQLWCHAR *)statementText, SQL_NTS), SQL return code: -1 | SQL Error Info: SrvrMsgState: 26, SrvrSeverity: 17, Error <1>: ErrorMsg: [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Internal Query Processor Error: The query processor encountered an unexpected error during the processing of a remote query phase. | Error calling: pReadConn->ExecuteQuery(statementText, bufferFormat) | state: FFFF, number: 24, active connections: 8', Connection String: Driver={pdwodbc};APP=RCSmall-DmsNativeReader:WAD1D16HD2001\mpdwsvc (3600)-ODBC-PoolId1433;Trusted_Connection=yes;AutoTranslate=no;Server=\\.\pipe\sql\query
Selecting from an external table gives the following error:
8/4/2016 8:19:40 PM [Thread:30544] [AbstractReaderWorker:ErrorEvent] (Error, High): QueryId QID2433 PlanId 6c3a4551-e54c-4c06-a5ed-a8733edac691 StepId 7:
Could not obtain block: BP-1726738607-192.168.225.121-1443123675290:blk_1159687047_86196509 file=/user/hive/warehouse/u_data/000000_0
Microsoft.SqlServer.DataWarehouse.Common.ErrorHandling.MppSqlException: Could not obtain block: BP-1726738607-192.168.225.121-1443123675290:blk_1159687047_86196509 file=/user/hive/warehouse/u_data/000000_0
at Microsoft.SqlServer.DataWarehouse.DataMovement.Common.ExternalAccess.HdfsBridgeReadAccess.Read(MemoryBuffer buffer, Boolean& isDone)
at Microsoft.SqlServer.DataWarehouse.DataMovement.Workers.DataReader.ExternalMoveBufferReader.Read()
at Microsoft.SqlServer.DataWarehouse.DataMovement.Workers.ExternalMoveReaderWorker.ReadAndSendData()
at Microsoft.SqlServer.DataWarehouse.DataMovement.Workers.ExternalMoveReaderWorker.Execute(Object status)
This error message can show up when the Hadoop Cluster is setup in a configuration where data nodes are only accessible outside the cluster using the Hostname and not the IP address.
Add the following to hdfs-site.xml file on the client (SQL Server) side. This configuration will force the name node to return a URI for the data nodes with the Hostname instead of the internal IP address.
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
SQL Server 2016 is running a PolyBase query on a directory with a large number of files (>30,000 files under the directory path recursively).
Msg 105019, Level 16, State 1, Line 1
EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_GetFileNameByIndex. Java exception message:
GC overhead limit exceeded: Error [GC overhead limit exceeded] occurred while accessing external file.'
Msg 105019, Level 16, State 1, Line 1
EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_GetDirectoryFiles. Java exception message:
Java heap space: Error [Java heap space] occurred while accessing external file.'
When processing a path, PolyBase enumerates all files under that path, and there is fixed memory overhead associated with the data structure that is used to represent the files. With a large number of files, this overhead becomes noticeable and can eventually consume all memory available to the JVM.
Rearrange the data in multiple directories so that each directory contains a subset of files and then break down the query in multiple ones that operate on a part of the original path at a time and materialize the tables as SQL Server tables (before joining them).
Example: Let’s assume your external table data is in the following location:
Orders/file1.txt,...,file30K.txt.
Even if you had the right directory structure to begin with, follow step #2 to be able to work with that many files without running out of JVM memory.
Setting up SQL Server 2016 or APS with a Hadoop cluster which involves modifying yarn-site.xml, hdfs-site.xml and other configuration files. We have seen this with 2 customers now and felt it was worth mentioning here.
Msg 105019, Level 16, State 1, Line 1
Microsoft.SqlServer.DataWarehouse.Common.ErrorHandling.MppSqlException: EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect. Java exception message:com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.: Error [com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.] occurred while accessing external file.' --->
This can happen if you have copied and pasted some lines of instruction from a website or Skype chat window. These sources that has it's own formatting and can produce unwanted/unprintable characters which when found in the configuration files will show error message as shown above.
Open the files in a smart text editor (something other than notepad) and look for these characters and eliminate them. Restart the necessary services.
This blog does not cover all the issues you can encounter when accessing data by using PolyBase. We plan to revise this document regularly with additional scenarios for the benefit of the masses.
Please sign in to use this experience.
Sign in