So despite the misreably long time it takes to run checkin tests, I've still managed get 2 checkins through. The downside is that both of those checkins resulted in bugs. On the plus side one of those bugs technically wasn't caused by me, but was merely exposed by me.
The other interesting point was that both failures should have been caught when I ran the checkin tests. In retrospect, I think they were, but I just didn't notice.
So how does a person just not notice checkin test failures? Well due to the large number of tests, there are actually several that are expected to fail. The causes include bugs in the JIT, runtime, out-of-date tests, out-of-date depedencies, and machine configuration. So instead of looking at which tests fail and which tests pass, we look at which tests regressed and which tests improve.
There is a rolling build system that supposedly provides baselines, but that still doesn't take into account machine configuration. So basically after the checkin tests run, I had to manually go over the list of failing tests and figure out if they're really regressions or some other acceptable reason for failure.
When all is said and done, I'd have to say this new checkin system I've inherited could benefit greatly from some stability enhancements. Likewise I'm sure I'll get better at distinguishing between noise failures and true regression failures. One feature I'd really like is a way to tag each failure with a bug #. Then the test harness would only note it as a failure if a tests fails without a bug #. If it was really advanced, it could even distinguish between different kinds of failures (did the assert change, did it AV instead of assert, etc.). Then when I run my checkin tests that supposedly fix a bug, it could even tell me if I didn't fix all of the tests that were failing for the same reason.