How do we test our libraries?

Hi, my name is Jeff Peil and I’m the QA lead on the VC++ libraries team.  Today I wanted to talk a little bit about how we test our libraries.  One of the biggest challenges with libraries is that we don’t just ship them to our customers directly, but that code gets included in the applications that are built using our libraries.  The most exciting part of working on the libraries is that if you do something great, a huge number of people can benefit (because it doesn’t just benefit people building applications with our tools, it can benefit the users of all those applications!)  The most terrifying part of working on libraries is that if we let something slip through, like a security hole, the potential impact is that much bigger.  Thus testing our libraries and catching everything we can is critical.


When I first joined the libraries team and started to think about how we should test our libraries, one thing was particularly clear there are many different kinds of problems that can occur for instance, here’s a few of them:

·         Shape of the API (is it usable/discoverable/flexible)

o        Even more subtle is that the shape of an API can directly impact the likelihood of someone misusing it in a way that will introduce a bug in their code.  Many buffer overruns in code using libraries can be prevented just by getting the shape of the API right.

·         Correctness (does the function/class do what it claims to do)

·         Performance

·         Leaks (memory/handles)

·         Thread Safety

·         Size/Space limits (files >4gb, large memory allocation on 64 bits, …)

·         Security


Additionally our libraries span a huge scope of problems, from UI controls in MFC to file IO to sockets to heap management to locales you run into completely different problem sets.


Just as a thought exercise consider what it means to properly test a function like rand you need to:

·         Come up with a way to make sure it’s good enough at generating random numbers

o        I’ve seen people who think it’s sufficient to do things like call rand 10 times and see if the same number came up twice in those 10 calls, but that notion is completely wrong.

o        Generally the best suggestions involve executing rand a large number of times, tallying all of the results and checking the statistical probabilities

§         But would this catch any problems with implementation of rand that just returned an incremented integer after each call…

·         Make sure that rand is fast enough

Compared to a function like sprintf the problems are very different, and the testing required is very different.  Further the problems require very different domain expertise.


Given all of these problems how do we deal with them?  Well we break our types of testing down into categories:

·         App building (to prove the shape of the library is good)

·         Directed testing (mostly correctness testing)

·         Benchmarking (for perf)

·         Stress testing (long running tests to look for leaks and threading issues)

For every feature we plan, we identify what testing in each of these areas makes sense.  Our goal is to automate all of the testing we do (so if we build an app, we’d want to make sure that it keeps building and make sure we understand any breaking changes we’re introducing and what migration pain that would create.)


Depending on the features we are working on, the balance between these areas can vary dramatically (for instance, we didn’t do any directed app building around the Secure CRT work, but we did port major code-bases to it instead.)


Once we have a plan in place, we’ll loop in domain experts where possible, as well as more senior team members to review our plans and make sure we don’t have any holes in our test plan.  After we know what we need to do, we can get to implementing tests.


So how do we know we have sufficient testing in an area?  Well we have some techniques that can help – code coverage is certainly useful at identifying test holes (but note that presence of coverage does not mean you’re in good shape, while code coverage is great for finding holes it can’t tell you what is covered.)  We also, of course, leverage other code where possible (such as building and shipping Visual Studio with the current CRT.)  Finally we track bug trends in areas to see if that indicates any problems, and of course we use betas/community feedback/CTPs as yet another tool to help us identify if we’ve missed something, your feedback is incredibly valuable to us.


To wrap things up, thanks for taking the time to read this, and if you have any questions or comments please feel free to email me (


Jeff Peil

QA Lead

Visual C++ Libraries team