SQL Server Error: 7320 Cannot execute the query "xxx" against OLE DB provider "DB2OLEDB" for linked server "xxx"

Here's a little something that me and my colleagues were looking at this week. Our customer was executing queries against a DB2 linked server using the latest provider and occasionally got these errors. One of my colleagues suggested to set a dump trigger on the error 7320 so that we could capture a sqlservr.exe dump when the problem next occurred. We achieved this by running

dbcc dumptrigger('SET',7320)

on the server. By default this will collect a mini-dump, unless you have enabled any other trace flags previously. When we had encountered the error and analyzed the dump, it showed up the following stack

sqlservr!ex_raise+0x52
sqlservr!COledbError::HandleExecuteError+0x11b
sqlservr!COledbRangeRowset::FDoPerformSchemaCheckForQuery+0xc7
sqlservr!COledbRangeRowset::FPerformSchemaCheck+0x1fb
sqlservr!COledbRangeRowset::FCheckSchema+0x2a
sqlservr!CEnvOledbRange::XretSchemaChanged+0x16
sqlservr!CXStmtDML::XretDMLExecute+0x41037b
sqlservr!CMsqlExecContext::ExecuteStmts<1,1>+0xace
sqlservr!CMsqlExecContext::FExecute+0x439
sqlservr!CSQLSource::Execute+0x355
sqlservr!CStmtExecStr::XretExecStrExecute+0x48d
sqlservr!CXStmtExecStr::XretExecute+0x20
sqlservr!CMsqlExecContext::ExecuteStmts<1,1>+0x39e
sqlservr!CMsqlExecContext::FExecute+0x439
sqlservr!CSQLSource::Execute+0x355
sqlservr!CStmtExecProc::XretLocalExec+0x125
sqlservr!CStmtExecProc::XretExecExecute+0x132
sqlservr!CXStmtExec::XretExecute+0x20
sqlservr!CMsqlExecContext::ExecuteStmts<1,1>+0x39e
sqlservr!CMsqlExecContext::FExecute+0x439
sqlservr!CSQLSource::Execute+0x355
sqlservr!process_request+0x312
sqlservr!process_commands+0x3fd

 

Before going into any in depth debugging which would involve monitoring the remote server and the components in between, my colleague thought it would be sensible to check the setting surrounding Schema Validation as we were obviously throwing the error in a routine related to this functionality (FDoPerformSchemaCheckForQuery). So we changed the option of the linked server to say enable "Lazy Schema Validation", using this syntax:

exec sp_serveroption xxxxx, 'lazy schema validation', true

If you look this option up in MSDN under "Configuring OLE DB Providers for Distributed Queries" you'll note the following regarding this option:

"If this option is set to false, the default value, SQL Server checks for schema changes that have occurred since compilation in remote tables. This check occurs before query execution. If there is a change in the schema, SQL Server recompiles the query with the new schema.

If this option is set to true, schema checking of remote tables is delayed until execution. This can cause a distributed query to fail with an error if the schema of a remote table has changed between query compilation and execution.

You may want to set this option to true when distributed partitioned views are being used against a linked server that is running SQL Server. A specified table that participates in the partitioned view may not be actually used in a specific execution of a query against the view. Therefore, deferring the schema validation may improve performance."

Now in all truth this was quite a pragmatic approach, in that we hadn't fully debugged the problem, we were just taking an educated guess, that we were experiencing a problem surrounding schema validation, and so far, since implementing this option, the errors have not re-occurred, so we've managed to workaround the problem.

For reference the customer in question was also experiencing other errors in the 7300 to 7320 range, which have also stopped occurring since this option was set. Errors such as

Error: 7350 Cannot get the column information from OLE DB provider "DB2OLEDB" for linked server "xxx"

Error: 7321 An error occurred while preparing the query "xxx" for execution against OLE DB provider "DB2OLEDB" for linked server "xxx"

Error: 7303 Cannot initialize the data source object of OLE DB provider "DB2OLEDB" for linked server "xxx"

The moral of this story is that linked servers can be very difficult and time consuming to troubleshoot, due to the number of distributed components, including ones outside of the SQL Server area completely. This is one time where I am a fan of testing some of the different settings in the provider to see if the problem can be worked around initially, before doing some in depth analysis. In most other cases I'd far prefer to know why, and we may yet revisit this particular situation to fully debug it, but sometimes a workaround and an error free production environment is also nice for a quick win.

For future reference another interesting quick diagnostic that can be performed on linked servers is to run Process Monitor on the server in question to see whether you are having any obscure permission of registry problems. I've seen this give some quick wins as well.