On code generation, testing and code reuse

Every once in a while I look at the statistics of my blog to see which post is getting more traction. Yesterday I published a post about a function which I found out recently, and seeing the same mistakes I had made in a recent code review I thought it would be nice to write about it so that if someone else came about the post, they would hopefully learn a little about it. I really expected that post to have a lot lower view count than others (in which I write about things which I have more experience with), and I was impressed to see that it already had surpassed the number of views than the previous post. And when I looked at the detailed stats, many of the views came from a link in reddit, and I was curious to why that was picked up there.

It was quite interesting to see the discussion on the list. I’ll ignore some of the trolling ones, but some are interesting. From “extremely narrow base of technological experience” to “[failure to] abstracting it into a function called ‘join’ or ‘concatenate’ or something is not senior” to “dev doesn't know to google a common programming problem before trying to implement a solution”, it’s actually humbling to know the outside view of my technical abilities. The first one is actually somewhat accurate (I wouldn’t use the “extremely” qualifier Smile), since there are many areas which I haven’t worked on much (OS/kernel programming, databases, graphical tools, iOS / android, distributed security, functional languages, and many others). Even in the limited world of WCF, where I’ve worked for quite some time, I’d say what there are many areas where I only have limited experience (such as transactions, security, COM+ and queuing integration). We’re always trying to learn more, and hopefully widening the technology experience will just come with time.

For the other comments, though, I think it’s nice to add some context, simply because I don’t want my mistake to reflect badly on testers in Microsoft in general (as was implied in one of the comments). And this isn’t some excuse why the code I wrote was bad – it was bad and I learned from it, but some context is important, so let’s go on.

Code generation for multiple scenarios

As I mentioned in my previous post, I created quite a few classes using the anti-pattern of not using a join operation when stringifying enumerations. That is true, but the why is interesting as well. When I joined the WCF team (about 7 years ago, as a “junior” software design engineer in test), I was assigned to test its serialization capabilities (including, but not limited it, the DataContractSerializer and the NetDataContractSerializer). Those serializers are fairly complex, and have many, many rules of what can and cannot be serialized, and the behavior in the presence (or absence) of certain attributes. There are many special cases, including different types, primitives (and serializer primitives vs. complex types), handling of collections, etc. One of the first things I (and another colleague at the time) decided to do was that we couldn’t (or shouldn’t) create “by hand” all the test types with the combinations we needed, so we decided to automate that process – we were writing code to write code. We ended up with a “type library” fairly complex (with hundreds of types of various combinations used by the serializer), and we were able to use that to find several bugs that were fixed before the product was shipped.

The type library had not only the members which we were interested for serialization testing, but each type also implemented Equals. That was great, so that the types themselves could validate whether a round-trip serialization was successful or not, making the “main” of the test code itself quite small – take a type, create an instance of it (itself a topic for another post), serialize, deserialize, compare with original instance. For completeness sake, we even added a GetHashCode implementation for them (even for types not used in keys in dictionaries), to fix all the build warnings in the system.

At that point we started seeing a great return on the type library, so we decided to expand it. After a while, the test code generator became our hammer, and many things started looking like a nail. I mentioned that I’ll have a separate post about instance creation, but one of the things we used the type library creator was to add code in each of the types we created to know how to create itself. That was a lot of code in the library which should have been elsewhere. And since we were there, we also decided to override ToString in the types, so that tracing them (in case of failures) would be really simple – that’s probably when I committed my first sin of overlooking string.Join. And since that was used for creating many types, the anti-pattern multiplied. And since the tests were actually running fine (the code was doing what it needed to do, after all, although with probably more instructions than it should), we had no reason to revisit them.

Fast forward one year (or two, I don’t remember), .NET Framework 3.5 was being developed and we added another serializer, the DataContractJsonSerializer. The type library was very instrumental in finding bugs in its first time, it was used again. Since JSON isn’t as expressive as XML, some tweaks needed to be made to the creator, and at the end it started again spewing code with the anti-pattern.

It was only after coming back to that code a few years later which I realized the bad practice I had created (as I mentioned, the code was running fine – it was technically correct, after all – so nobody was paying attention to it). When I started looking at some additional types for testing model binding (a kind of deserialization) on the ASP.NET Web API that I realized the code on the library just didn’t look right, which is when I started using string.Join came to the rescue. I actually didn’t give much attention at the time, but after seeing the same anti-pattern in a code review I decided to write about it.

The main takeaway about code generation is that it is quite a powerful tool, but it has to be used with caution – any bad code which is introduced will be multiplied in the actual code. Even the assumption that we made at the time that any bugs we had in the generated code could be fixed in only one place (the generator) by simply recreating the library again doesn’t hold – existing code (especially code that “runs fine”) tends to be used as a template for new code for more junior developers / testers. So use code generation whenever it makes sense, but make sure that it goes through extra scrutiny (yes, I’m talking about code review, which shouldn’t have let the bad code pass in the first place) so that it doesn’t start spreading anti-patterns throughout your codebase.

Bug repros: to simplify or not to simplify

Another point which was raised in the discussion is why not simply have a helper function to do the formatting, and have the ToString code call it? That could have been done even in the case of generated code, although the benefits aren’t as obvious (in both cases, any change requires modification in only one place, at least on my original assumption). The fact that I work in the test team has a big influence in why we decided at that point to simply have the code self-contained versus using some additional class. This is actually another philosophical discussion for which I changed my opinion throughout the years. For that, I’ll give some perspective on the work of my test team (in a not-so-distant path) at Microsoft.

When testing a framework (not an application) upon which many applications can be built, and which can be used in many different environments (OS versions, OS SKUs, framework versions, cultures, hosting process, etc.), and in order to do that, we have a fairly complex test infrastructure (in some cases too complex, but that’s a topic for another discussion). So the tests actually need to be written targeted at those infrastructures, and sometimes the price we pay for the power to run in many environments is that even simple tests require a lot of boilerplate to interact with the framework itself.

Side note: the “impedance mismatch” between simple usages and the test framework used to be a big problem in the WCF test team; the team has come a long way since then that this problem still exists, but is a lot less acute. In my current team (ASP.NET Web API), we’re actually sharing a lot of the testing framework used by the developers (to write unit tests), so this problem is quite small nowadays, at least in my team.

Now, what is the role of the test team at Microsoft? There are many alternative definitions (measure quality, identify defects, “represent” the customer), and there’s no single one which captures all that the test team does, but one thing which is quite common is to find and report bugs in the products before they ship. And the goal of reporting those bugs is to get them fixed, so that the users don’t end up finding them. This isn’t a perfect science - I don’t need to remind anyone of that, but quite a lot of Microsoft products have bugs on them, and I dare say almost all software today does – but every bug the test team finds and gets fixed is one less bug that the customers will encounter. In order to get the bug fixed, it’s common to provide the developers a way to reproduce the issue (a “repro”), so that they can track down the cause and fix the code.

Side note: this is the account of my experience in the teams I worked at; it’s likely that other teams in Microsoft use different practices and may even have different rules.

What exactly is a “repro”? That really depends on which product team you’re on. If you’re on a team which builds something such as Microsoft Word, the repro can be a series of steps which cause the bug to show up (e.g., create a new document, select font “abc”, type text “these characters with accents will trigger the bug áéíóíú”, verify that the letters don’t show up with accents as they should). In framework teams (which is my case), a repro is usually some code which the developer can run and verify the problem. The regression / integration tests are code, so in theory we could simply point the failing test to a developer and let them go at it. The problem with this approach, as I mentioned before, is that running the code could involve quite a few steps, and the harder you make for a bug to be reproduced, you decrease the odds of it being fixed (which is the ultimate goal, after all). So what I tended to do, and what most developers I worked with liked, was to create a stand-alone project (or even a single .cs/.vb/.cpp file) which could quickly run and show the issue.

Now, having a self-contained test is a tricky thing. We can have a common “helper” library which all “repros” use, but that isn’t as simple as a “single-file repro”. We can also copy/paste the relevant library code into the single-file repro, along with the code which exercises the framework and causes the bug, but doing this for all bugs we open tends to involve an extra copying step for every single bug. For serialization bugs, having the code contained in each type actually made opening bugs easier – just copy/paste the class(es) which has a problem into the description of the bug tracking software. So I accepted that some code duplication, if managed by a centralized place (such as a type library creator) was ok. It was after I realized that central control doesn’t work that I reversed my position and started moving to the more purist (reuse – good; multiple copies – bad) side.

Nowadays I still try to make the bug repro as simple self-contained as possible, even occasionally paying the “penalty” of copying over some common code. Having some templates are also time-savers when opening bugs, and that somewhat reduces the “penalty” of the self-contained repro. The fact that we’re using the same test harness as the development team also helps, as in some cases if I’m really busy with other tasks I can just point it to our test harness as well.

Mistakes and what we learn from them

Even though I’m officially now a “senior” tester (at least that’s what my title says, not that’s worth anything), I’ll continue making mistakes, some probably as basic (or even worse) than this one and hopefully learning from them. Some of them will be public, and some will be internal. I’ll continue posting them here, even if that means inviting ridicule from the internet, as most comments actually have some valuable information (one of the comments pointed out a bug I had in my previous post, which was a missing Select statement) – and the nice increase in page views that comes with it.