The Scientific Method and testing OneNote performance


We finished one of our performance rounds of testing a few weeks ago (one of many) and were looking at our numbers.  In some cases, the numbers looked very good and others showed some "areas for improvement."  One of the folks involved stopped by and mentioned that since all the numbers weren't good enough, our round of performance testing did not meet our goals.


I completely disagreed.  The goal of the performance work was to establish an accurate measure of OneNote performance, and we managed to do that.  When we set out to gather the numbers, we followed the Scientific Method of testing the system:


  1. We asked a question:  how many bytes does OneNote send over the wire to the server when syncing a notebook?

  2. We did our research.  We looked at the number of bytes of the wire sent by previous versions of OneNote and used them as a baseline.

  3. We constructed a hypothesis.  We assumed that the number of bytes over the wire would be roughly equal to the size of the notebook in bytes + the size of the headers sent in bytes + the size of the authorization calls in bytes. 

  4. We used the fiddler tool ( to analyze the client requests to our SharePoint server.  This tested our hypothesis.

  5. We analyzed the data gathered and drew conclusions.  Our conclusion was we could do better - you can't have too much performance!   

  6. I sent a report to our stakeholders with our summary.


This looks like a successful trial to me.  We started without any data, followed a plan and quantified data that we know is good. We can act on the data and make changes to OneNote to try to get desired results.  Now that we have our tools created, we can verify the results of changes to OneNote as time goes by and measure performance.  We can use these measurements to make further decisions about OneNote.


The mistake the fellow made was assuming that since we did not have "perfect performance" from the start, we had failed in our testing.  That misses the point of testing.  We need to be able to gather meaningful, accurate statistics that tell us the stability of the product.  Failing to get meaningful numbers (in this case, perhaps using a known faulty stopwatch to measure time) would indeed be a testing failure.  My co-worker misunderstood the intent of the testing performance focus week.  We wanted to ensure we had a solid baseline from which to analyze changes in the future.  Now we do, and that is success.



I guess what I'm trying to get at is as a tester, I'm interested in knowing the tests I run are telling me about the state of the product I'm testing.  I'm concerned about the product, of course, but my primary role is accurately reporting that product's current state.  If my tests aren't telling me anything about the product, don't deliver consistent results or otherwise give me inaccurate data, then I am very worried and need to rethink my tests.  I would not be able to complete step 5 above - I could not draw any conclusions.


Likewise, I want to steadily progress through my testing.  It's pretty easy to start "testing in a circle," skipping step 2 of doing research beforehand and never verifying step 4 by never realizing you never had a hypothesis to test.  This probably deserves some more thought.  I'll try to follow up in the future.


Questions, comments, concerns and criticisms always welcome,


Skip to main content