Code behaving badly

Wow, two programming/C# enties in a single day...

Before I joined the C# team, I was the test lead for the C++ compiler for a number
of years. We would periodically get customer comments that "the compiler was
broken", and upon further investigation, we would usually find that it was a bug in
the program. There was usually a good correlation between the amount of experience
of the programmer - those with more experience normally suspected their code first,
and only after careful research would consider the compiler (and they were usually
right at that point).

One of the nice things about the C# compiler not having pointers is that it's much
harder to accomplish bad things ("Try to imagine all life as you know it stopping
instantaneously, and every molecule in your body exploding at the speed of light".
Shame on you if you don't recognize the quote). If you're playing the interop game,
you're back in the pointer-world of sharp sticks, and you can easily create the otherwise
elusive "Execution Engine Error".

Last week, I upgraded to build 30730 of VS and the runtime. (This means "third year,
seventh month, and 30th day", and is also known in Microsoft parlance as the
"Julian Date", even though is isn't a julian
date
. This replaced our previous scheme (also not a julian date) that we used
on VS 2002 and 2003, which replaced the scheme we used in VS6 (also not
a julian date). As far back as I remember, our numbers had always been called julian
dates but never were. An ideal dating system is monotonically increasing by 1 (so
you can tell how far apart builds are) and easy to convert to human-readable dates
(so you know when the build was created), but that's not really possible, so at least
we've finally settled on something where you know when the build was, and it works
for more than a couple of years (previous versions broke badly when confronted with
the long dev cycle of VS 2002). It's a testament to the understandability of the previous
schemes that I don't remember what they are, but I do know that many people ran
little JDate applications on the desktops so they knew what jdate to use for today.
But I digress)

I got the new build on, and nothing broke (a nice thing occurance), rebuilt, and ran
my app. It worked fine in most areas, but when I tried to use one function, I got
an null reference exception. Of course, I initially thought my code was bad,
but a little debugging narrowed the problem down to an innocuous-looking function:

         private void CheckType<T>
        (DBObject node, List<int>
            list) { if (node is T) { if (node.Checked) { list.Add(node.ID); } } } 
        
    

In my app, I have a treeview with different node types in it, and I need to get the
list of all check nodes of that type into a list so I can persist it. This function
is called for each node and each type of node, and it fills in the items.

All the parameters were correct on being passed in, but when they get into the function,
list is nowhere to be found, and calling list.Add() causes problems. Since this code
worked before and the debugger couldn't find list, I started to suspect a code generation
problem. Further investigation showed that even if list.Add() was never called, the
program would blow up at some future point.

I just finished a session with one of the CLR guys to try to find the root cause and
get a small repro case (small repro cases are the holy grail of tracking code generation
issues). He knew that there had been some changes in JITting generic methods when
one of the parameters was a MarshalByRef type, and we were able to create a small
project that throws an ExecutionError at will. That will allows us to find the problem
and get it fixed.

The moral of the story - and I'm sure if you've read this far you're expecting a moral
- is that while it's usually your code that has the problem, sometimes it's the underlying
system that has issues, so don't be too trusting...