Missing data when querying IBM DB2 tables due to code page issues

We recently worked on a support case where a customer was using the Data Access Tool (DAT), which is included with Host Integration Server and the Microsoft OLE DB Provider for DB2 that is included in SQL Server Feature Packs, We often suggest that you use the DAT to create the connection string (UDL) that is used to connect to your IBM DB2 systems and then to run the Sample Query to the connection string to see if you are able to make a basic query without error.

In this case, the data returned by the Sample Query was incomplete.

The Sample Query issues a query against the SYSTABLES table in DB2. SYSTABLES is a system catalog tables that contains information about each table that is defined in the database. When you run the Sample Query in the DAT, it sends a query to DB2 to get a list of the tables that are defined in the Schema that is specified in the connection string. The following is a sample query that is sent to DB2 for AS/400 when the Schema is set to TESTUSER:

SELECT '' as TABLE_QUALIFIER, VARCHAR (RTRIM (TABLE_SCHEMA), 128) as TABLE_OWNER, VARCHAR (RTRIM (TABLE_NAME), 128) as TABLE_NAME, VARCHAR (RTRIM (TABLE_TYPE), 128) as TABLE_TYPE, VARCHAR (TABLE_TEXT, 51) as REMARKS, CASE WHEN TABLE_TYPE = 'A' THEN 1 WHEN TABLE_TYPE IN ('P', 'T') THEN 2 WHEN TABLE_TYPE IN ('L', 'V') THEN 3 ELSE 4 END AS SORTORDER, VARCHAR (RTRIM (BASE_TABLE_SCHEMA), 128), VARCHAR (RTRIM (BASE_TABLE_NAME), 128) FROM QSYS2.SYSTABLES WHERE TABLE_SCHEMA = 'TESTUSER' ORDER BY SORTORDER, TABLE_OWNER, TABLE_NAME FOR FETCH ONLY

Actually, this is the second SYSTABLES query that is sent by the DAT because the first query actually looks for SYSTABLES in the specified schema first. The first query would have specified TESTUSER.SYSTABLES for the query. If SYSTABLES is not in that location, the query completes with an error and the query above is sent to look for SYSTABLES in the default schema (or library when connecting to DB2 for AS/400).

The result of the query should be a list of tables that exist in the specified SCHEMA. These tables along with some additional information about these tables is displayed in a grid in the DAT. The following is an example of these results using example data:

Table_Catalog Table_Schema Table_Name Table_Type Table_GUID Description Tables_Properties Date_Created Data_Modified
DB2HOST TESTUSER TABLE1 TABLE (null) Test 1 (null) (null) (null)
DB2HOST TESTUSER TABLE2 TABLE (null) Test 2 (null) (null) (null)
DB2HOST TESTUSER TABLE3 TABLE (null) Test 3 (null) (null) (null)
DB2HOST TESTUSER TABLE4 TABLE (null) Test 4 (null) (null) (null)

In this particular case where the Sample Query data was incomplete when viewed in the DAT, the TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE, and Description columns didn’t include any data. The TABLE_QUALIFER column did contain the correct data.

We suggested that they also try running this same query from a Linked Server within SQL Server using the Microsoft OLE DB Provider for DB2 (which is what the DAT was also using). The results were the same in that the data was not correctly displayed.

At this point, we had to proceed to our normal troubleshooting steps which meant capturing the scenario with DB2 Network Library traces (using snatrace.exe) and a network trace that captured the data flow between the Windows Server and the IBM DB2 system (an iSeries or AS/400 in this case).

After looking at the traces, we found that the problem was caused by a code page (CCSID) mismatch. DB2 was returning the metadata for the SYSTABLES information for the tables that were defined in the SCHEMA specified in the connection string. Part of the metadata is the CCSID (code page) for each of the columns in the SYSTABLES table. The following table is a breakout of the CCSID information returned for the key columns:

Column_Name CCSID
Table_Catalog X’0025’ (37)
Table_Schema X’7025’ (28709)
Table_Name X’7025’ (28709)
Table_Type X’7025’ (28709)
Description X’3A9’ (937)

 

CCSID 28709 is Traditional Chinese (extended) and CCSID 937 is Traditional Chinese (extended) including 4370 UDC (user defined characters). 28709 is a Single-Byte Character Set for Traditional Chinese, while 937 is a Mixed Byte Character Set for Traditional Chinese.

So DB2 is saying that these columns are defined using Chinese code pages, even though they don’t actually contain Chinese data. We never discovered why this was the case, but it explained why the data was not being displayed in the DAT following the query.

The connection string included the following settings for the code pages:

Host CCSID=37
PC Code Page=1252

CCSID 37 is an EBCDIC – U.S./Canada code page and this code page is not able to translate the incoming “Chinese” data to code page 1252 (ANSI – Latin I).

You can also see the data conversion problem when viewing the DB2 Message trace (DB2MSG1.ATF) captured by the snatrace.exe tool. The following shows the data conversion sequence:

Converting multibyte data with CCSID 0X03A9 to multibyte data with CCSID 0X04E4

NLS   -----------------------------------------------------------------------
NLS   Input: Length = 7
NLS   D7D9D6C4 C4E3C1      
NLS   -----------------------------------------------------------------------
NLS   Output: Length = 0

GetColumnData for column 3
Converting multibyte data with CCSID 0X03A9 to multibyte data with CCSID 0X04E4

NLS   -----------------------------------------------------------------------
NLS   Input: Length = 8
NLS   C3D5C2C6 F0F4F1F1      
NLS   -----------------------------------------------------------------------
NLS   Output: Length = 0

GetColumnData for column 5
Converting multibyte data with CCSID 0X03A9 to multibyte data with CCSID 0X04E4

NLS   -----------------------------------------------------------------------
NLS   Input: Length = 50
NLS   40404040 40404040 40404040 40404040 40404040 40404040 40404040 40404040
NLS   40404040 40404040 40404040 40404040 4040   
NLS   -----------------------------------------------------------------------
NLS   Output: Length = 0

In the data above, you can see the “from” CCSID (e.g. 0X03A9, which is 937) and the “to” CCSID (e.g. 0X04E4, which is 1252). As you can see above, the “Output Length” for the data conversion is 0, so no data is displayed in the DAT output because there is no converted data to display.

The solution to this was to change the code page settings specified in the connection string from the settings above to the following settings:

Host CCSID=937
PC Code Page=950

After making this change, the data for all of the columns was correctly displayed in the DAT when running the sample query.

As a side note, we also created a sample C# application that used the Managed Provider for DB2 (included with Host Integration Server) to perform this same SYSTABLES query. In this case, all of the data was correctly written when using this application. This brings up the question as to why this application using the Managed Provider for DB2 did not experience the same code page problem as the OLE DB Provider for DB2.

The answer is that in C# all string (CHAR) data types are considered to be Unicode. Therefore, the Managed Provider for DB2 uses a Unicode CCSID when doing the code page conversions and this code page can handle the “Chinese” data (which really isn’t Chinese, it just indicates Chinese based on the CCSID for the columns).

In the failing case (non C#), the Data Access Tool and the Distributed Query Processor (used by Linked Servers in SQL) are smarter in that they support multiple data types so they don’t assume all string data types are Unicode. These consumers actually try to use the data types indicated in the returned metadata from DB2.

The moral of the story is that unexpected code page issues can pop up and cause some unusual symptoms when running queries against DB2.