Migrating off of Teradata and Business Objects onto SQL Server 2005

Lately, one of my missions in life has been migrating clients off of Teradata and Business Objects and onto SQL Server 2005 (database engine, Analysis Services and Reporting Services). I'm just putting a couple notes to self out here so that I don't forget them for future assignments. (No major search engine turned up the Irritating Error Message™ yet.)

The most common steps required to convert the BO-generated "SQL" statements (for Teradata) straight into T-SQL (at least for me, so far) are:

  1. Removing the schema name from object references
  2. Adding ING to SUBSTR
  3. Replace the ordinally-valued GROUP BY clause members with column-named members
  4. Replace @variable(…) and hard-coded values in JOIN and WHERE clauses (where appropriate for query parameterization) with SQL variables
  5. Remove any and all columns from the SELECT and GROUP BY clauses that are not displayed in the report

Everything else so far seems to be pretty cross-compatible, ANSI-wise. A much nicer, easier migration than I've typically experienced from Oracle to SQL Server, but I digress...

Teradata apparently supports ordinally referencing columns in the GROUP BY clause, but trying to do so in SQL Server 2005 results in the following Irritating Error Message™:

Each GROUP BY expression must contain at least one column that is not an outer reference.

Although the error isn't really caused by an "outer reference" (as I understand the phrase), it's an easy fix. Anyway, the strategy of using "Find-and-Replace with Regular Expressions" to convert all these queries is out the window (no pun intended). *sigh* Note: Examples of converted Teradata/BO queries have been replaced (to protect the innocent) by equivalent T-SQL statements that will run as advertised in the AdventureWorks sample database.

--> Teradata example of ordinal column reference

SELECT

   c.FirstName,

   c.LastName,

   cc.CardType,

  count(*) AS Total

FROM HumanResources.Employee e

INNER JOIN Person.Contact c ON e.ContactID = c.ContactID

INNER JOIN Sales.SalesPerson sp ON e.EmployeeID = sp.SalesPersonID

INNER JOIN Sales.SalesOrderHeader soh ON sp.SalesPersonID = soh.SalesPersonID

INNER JOIN Sales.CreditCard cc ON soh.CreditCardID = cc.CreditCardID

GROUP BY

   2, --> This doesn't work in SQL Server. You have been warned. ;-)

   1,

   3

--> SQL Server equivalent for the same query

SELECT

   c.FirstName,

   c.LastName,

   cc.CardType,

  count(*) AS Total

FROM HumanResources.Employee e

INNER JOIN Person.Contact c ON e.ContactID = c.ContactID

INNER JOIN Sales.SalesPerson sp ON e.EmployeeID = sp.SalesPersonID

INNER JOIN Sales.SalesOrderHeader soh ON sp.SalesPersonID = soh.SalesPersonID

INNER JOIN Sales.CreditCard cc ON soh.CreditCardID = cc.CreditCardID

GROUP BY

   c.LastName,

   c.FirstName,

   cc.CardType

 

<editorial_opinion>

This ordinal reference seems like a handy shortcut at first glance, but for Real World™ queries, it would become a maintenance nightmare! I hope that it's shorthand that we don't ever support in T-SQL or LINQ!! Imagine trying to sort out what's wrong with your GROUP BY 1,7,9,8,2,3,4,5 clause after you make some complex changes to rearrange your to your SELECT clause! Ouch.

</editorial_opinion>

 

All things considered, lowering TCO and getting better flexibility and performance with SQL Server 2005 is pretty easy. You just might have to cut-n-paste a few extra characters for column names in your GROUP BY clauses. Heh!

 

Some additional caveats that I'd like to append are:

 

  1. Some of the BO-generated SQL seems to be very convoluted and suboptimally performant. Significant performance gains could potentially be made by improving the JOIN syntax (such as moving equality tests from the WHERE clause to the ON subclause of the JOIN. There are many suboptimal subqueries throughout as well as implicit JOINs. Wherever possible, implicit JOINs and subqueries should be avoided in favor of explicit JOIN syntax (such as INNER JOIN) and the creation of new entities for JOINing where necessary.
  2. Wherever relational-based reports are the production standard, serious consideration should be given to using developer-created stored procedures instead of raw GUI-generated or protein-generated T-SQL statements. Stored procedures provide an enormous number of benefits over raw SQL statements that space constraints prevent us from covering here.
  3. Test THOROUGHLY!!

Then just paste the result (ideally the stored procedure name) into the Report Wizard and keep clicking next...