Scientific testing is based on controls, transparency, and repeatability. Whenever we as technical professionals want to test the performance of a database system, we search for a series of tests that show the system’s metrics against a standard.
But the scientific basis for using the most common standard, the Transaction Performance Council (TPC) measurements (http://www.tpc.org/), is difficult for most database professionals. The TPC metrics are divided up in “Benchmarks”, classified as C, DS, E, H and “Energy” as of this writing. These involve everything from measuring OLTP (in multiple types), virtualization technology, and all the way to business-intelligence type workloads. It takes no small amount of study to understand what these measurements show and how they apply to the systems that are tested.
And that forms the main issue with TPC numbers – the testing is done by and for the various database vendors (Microsoft included), which leads to the problems in the other areas – controls, transparency and repeatability. While the TPC standard is public (and lengthy, and sound), each vendor tunes the hardware, platform and workloads as much as possible to favor their database (controls), doesn’t often disclose those parameters (transparency) which of course leads to a problem of your reproducing those results to ensure that you can verify them (repeatability).
And in the end, none of this matters anyway – your workloads don’t resemble those controls at all. They are a statistically spread, standardized way of measuring various vendor systems and hardware using transactions. In your case, you want something that resembles your workloads, future workloads, and you want a standard way of reproducing those results. So in many shops where I’ve worked, I created my own tests. This works, but I was never sure that I had covered all of the areas I needed to ensure that the workloads were representative.
So at Microsoft we’re starting to focus more on a scientific methodology that more closely resembles real-world workloads, is repeatable on your own systems, and measured (starting with our SQL Databases offering in Microsoft Azure) in a published document. We call this new measurement “Database Throughput Units” or DTU. You can find the complete document here: http://msdn.microsoft.com/en-us/library/azure/dn741327.aspx. It’s short – and that’s on purpose. A more simple description allows you to replicate what we’ve done, and change it to be more relevant to your own workloads. Almost all parts of the process are under your control. And while we have standards published based on our testing, we recommend you use the same methodology on all your systems and ours, to show a true benchmark. The culmination of the process is throughput – the time it takes a user to make a request for a database operation and get a result. That’s all they care about, and in the end it’s what your final decision will be judged on.
There are multiple areas in the standard, including:
- The Schema – A variety and complexity within the structure to show the broadest range of operations.
- Transactions – A mix of types within the CREATE, READ, UPDATE and DELETE operations (CRUD Matrix) that can be tuned to a real-world observation.
- Workload Mix – A distribution of the above measures that more accurately resemble your environment.
- Users and Pacing – The number of virtual “users” that a measurement should show, and how often each user performs each action to show spikes, lulls and other anomalies faced in real-world systems.
- Scaling Rules – A scale factor applied to the number of virtual users per database.
- Duration – The length of time for the test run – one hour is considered minimum, longer is better for a true statistical result.
- Metrics – DTU focuses on only two end measurements for simplicity: throughput and response time.
You can read the full document at the link above. As always, all comments are welcomed.