Test data? What do you mean test data?

One of the things I struggled with as a teacher was getting students to understand the importance of a test plan. Not a plan on how to take a test, although that was a topic that came up with regards to the Advanced Placement Computer Science (APCS) exam. No, rather what students sometimes had trouble grasping was the importance of planning how to test their computer programs.

My good friend Tom Indelicato, who teachers computer science at Bishop Guertin HS, recently wrote about an experience he had with bad test data. Read the whole entry when you get a chance but the short version is this: Tom has been working with a Visual Basic version of a program called ChipWits for a number of years. He acquired a set of instructions for this program, which is something of a computer emulator, and tried them in his version. It didn’t work and he could not figure out what was wrong with his program. Only recently did he find out that the problem was not with his program but with his test data.

In one form or another that is a typical problem for testing software. The test data has to be correct. What does correct mean? Well, that gets complicated at times but my general rule of thumb is that good or correct data has a known outcome. The key thing is that the program should take specific data and return a result that should be known in advance. And of course you should make sure that the program handles “bad data”, what ever that means in the context of the program, and returns a predictable result.

Often is the time I have looked at a student working on a program, watched them enter data and get a result. If the program did not crash they assumed the program worked. Ah, the confidence of youth. So naturally I ask them how they know the answer is correct. About half the time the answer I got was “I don’t know.” Most of the rest of the time the answer I got had something to do with an assumption that the computer was doing the right thing. In short, trust the computer. Now the teachable moment is used to explain a couple of things. One is that it’s not the computer I need to trust but the programmer. The other thing to explain is that assuming something is right is seldom a good idea.

For many projects I assigned I would supply test data. In some cases this makes sense. For example one of the first, if not the first, programs I used to assign was a temperature conversion program. At that point in the course the students had few programming skills and very little idea about testing. So I would explain the test data and why I picked it. I always used three data points: freezing, boiling and -40. I would explain that I picked those values because I knew what results I would get without having to do the math by hand.

For later projects we had discussions about boundary checking. This was especially important around the time we talked about arrays. One always wants test data on either side of the boundary condition as well as on the boundary itself. This is the same time students usually discover division by zero and subscript out of range errors.

One problem with providing test data is that the occasional student will write special casing code to get around bugs that they are too lazy to fix so that the program works with the test data. That sort of code is open to all sorts of other errors and often will not work correctly with other data. For that reason I always made it clear that I would use a different set of test data for grading projects. This seemed to avoid most of the special casing but it had the more important value of causing many students to develop their own test data for their programs. And that is a good thing.