Glenn Paulley, Director of Engineering at Sybase iAnywhere, posted a commentary titled “The State of TPC-E” on his blog three weeks ago (10/3/08). A better title would have been “All TPC-E Results Are On Microsoft SQL Server. Why?” Mr. Paulley takes issue with Brian Moran’s statement that “the most rational answer is that Oracle and IBM have tried to top Microsoft’s numbers and simply can’t”. He says that while it may be true, he doubts it and says there are other plausible reasons why DB2 and Oracle have yet to publish any TPC-E results. Curiously, he doesn’t say why Sybase hasn’t published TPC-E results. Since he is, presumably, in a position to know, one can only conclude that he would rather not say. Readers can reach their own conclusions about what that might mean.
To his credit, he cites this IBM whitepaper for explaining that TPC-E was designed to be more realistic than TPC-C. There are numerous ways, detailed in the whitepaper, in which TPC-E is far superior to TPC-C. Let’s compare TPC-E to TPC-C. As the table below shows, in TPC-E the schema is substantially richer and more complex, there are twice as many transactions, and only TPC-E requires essential capabilities such as referential integrity and RAID protected storage.
Number of database tables
Tables with foreign keys
unrealistic; single dimension common
to 8 of 9 tables
two independent dimensions
Number of transactions
Database roundtrips per transaction
min 1; max 5
Referential Integrity Required
Storage Protection (e.g. RAID) for Database Required
Timed Database Recovery test
Mr. Paulley chooses to focus on the query complexity of TPC-E. While that’s somewhat interesting, a comparison to TPC-C would have provided important context. For example, TPC-E has 156 DML statements. Although TPC-C doesn’t include pseudo-SQL the way that TPC-E does, if it did and followed the TPC-E style, it would be fewer than 30 DML statements. By this measure, TPC-E has more than five times as many distinct DML operations as TPC-C.
But more importantly, TPC-E is not and was never intended to be a query optimizer test. The pseudo-SQL code in TPC-E is an example, not a requirement. Unlike TPC-H which strictly limits changing the specified SQL, in TPC-E test sponsors are free to rewrite the SQL anyway they like as long as it is functionally equivalent. One vendor might rewrite it to remove all joins while another might rewrite it to include more joins or more complex joins. The same is true of group by and order by clauses. In our view, Mr. Paulley’s objection that TPC-E isn’t a good optimizer test is misplaced.
After discussing query complexity, Mr. Paulley offers four reasons why Microsoft is the only database vendor publishing TPC-E results.
· “TPC-E is a moving target” – While it’s true that the TPC-E spec is up to version 1.6.0, the assertion that the workload has changed significantly is unsupported by the facts. None of the transactions has changed in any way that impacts performance. All spec revisions have been classified as “minor” changes by the TPC and results across all spec revisions are comparable. The number of revisions to the spec since it was first released actually reflects a deep commitment by the members of the TPC-E committee to clean up rough edges and address areas of ambiguity before they become issues in published results. A better gauge of the high quality of the TPC-E spec is that to-date 18 results have been published by six vendors spanning 15 months, but there have been no compliance challenges.
· “Both DBMS vendors and hardware suppliers have a substantial investment in TPC-C expertise.” On this point we agree with Mr. Paulley. But we draw different conclusions. All of the major DBMS companies have spent years picking through every detail of TPC-C. It has been optimized to such a degree that it long ago stopped driving customer-relevant engineering improvements. TPC-C is 16 years old and has changed little since 1992. Saying that we should continue using TPC-C because we know it so well is like saying that we should drive horse and buggies because we have a lot of expertise in blacksmithing. This is a mindset trapped in the past and doesn’t serve our customers.
· “TPC-E isn’t that cheap.” In fact, TPC-E is substantially less expensive to configure and run than TPC-C. Two results from IBM within the last month prove the point. As you can see in the table below, running on the same server, the TPC-C configuration was more than five times more expensive than the TPC-E configuration. Further, on the four proc server, the TPC-C result had 1361 disks with no data protection, while the TPC-E result had 400 disks with RAID-5. Which is the more customer-relevant configuration?
IBM System x3850 M2
IBM System x3850 M2
Procs / Cores / Threads
4 / 24 / 24
4 / 24 / 24
Total System Cost
1,344 x 73.4GB disks
16 x 500GB disks
400 x 73.4GB disks
Data Storage Protection
TPC Result Details
· “Customers continue to desire and reference TPC-C results.” Granted, TPC-C has stood the test of time. But today it is outdated, over-optimized, and of questionable relevance. Customers hold onto TPC-C because it is familiar and available, not because it is better. Database vendors need to exercise leadership. As Mr. Paulley says “Microsoft is an early adopter of TPC-E”. At this point, though, the early adopter window has passed. TPC-E was ratified 20 months ago. The first result was published 15 months ago. There are 18 published results. We believe that customers will readily embrace TPC-E as a superior benchmark as more results become available.
The more time that goes by, the more one is inclined to believe that Brian Moran is right – other database vendors aren’t publishing because they can’t beat the existing SQL Server results. We invite Sybase and Mr. Paulley to prove us wrong. We are confident that once Sybase runs TPC-E instead of just writing about it, Mr. Paulley will gain a new appreciation for just how challenging and technically rigorous TPC-E is compared with TPC-C.
SQL Server Performance Engineering