More thoughts on error tolerance.

I've posted before about error tolerance and how I consider that a very important part of VC#.  This time I want to talk about some of the difficulties that arise when you go down that route.

I'm going to start with a bug we recently discovered and how error tolerance might have made it go overlooked.  Imagine you are writing:

using System;

class MyAttribute : Attribute {

    public MyAttribute(DateTime dateTime) { /* ... */ }

}

[My( //<-- typing here

We're going to show you parameter help for the "MyAttribute" constructor right there.  In order to do that we first need to figure out what "My" is.  To do that we go to:

[

and we try to figure out everything that's valid there.  Because it's the beginning of an attribute we know that only types that extend System.Attribute  are valid there and that if you've typed a name like "Foo" then we need to search for types named Foo or FooAttribute.  Once we've done that we find all the valid attribute constructors and build up the parameter help tooltip that we want to show you.  Part of that process is figuring out what the parameter help will look like.  You might think it's always the same, but that's not actually the case.  If you have an empty file and you start typing:

[My(

then you'll see "My(System.DateTime dateTime)".  If you have:

using System;

[My(

then you'll see "My(DateTime dateTime)"; and if you have:

class System {

    [My(

}

then you'll see "My(global::System.DateTime dateTime)"; and if you have:

using System;

using SomeOtherNamespaceThatIncludesDateTime;

[My(

then you'll see "My(System.DateTime dateTime)".  etc. etc.

As you can see the only difference is how we qualify the type of the argument for the constructor.  We try to use the simplest type name possible to give you a clear and concise tooltip.  As it turns out there was a small bug when doing this.  We already built up the list of valid types that we would know about when you typed the '[' and we mistakingly used that list when trying to figure out the simplest type name for the argument.  Because of that we weren't even able to bind to the type "System.DateTime" (it doesn't extend "Attribute") and so figuring out the simplest type name wasn't possible.  Now, in the past we took a very error tolerant route and just output the fully qualified name to the tooltip.  Unfortunately that decision helped to make this bug go overlooked.  

The number of different intellisense features is huge and when we talk about the "matrix" (how each feature interacts with each other) you can be staggered by how many different interactions there are to consider and test.  So in this case there probably wasn't a specific test for this exact functionality and even i never noticed that anything was wrong when I was using attributes (who really thinks anything is wrong when they see "System.DateTime" instead of "DateTime"?).  So how did this bug get caught?  Well, a little while back Kevin decided to change the logic for finding simplified type names a bit.  Instead of being error tolerant and choosing the fully qualified name if we can't find the best type name we instead fail fast.  This failure tends to propagate up a while until it is caught and tends to manifest itself through very obvious broken functionality.  In this case you might be missing parameter help completely in that case.  After making this change we received a large amount of error reports on many features from "extract interface" to the "code definition window" (there are a heck of a lot of features that sit on top of this functionality).  

The question that I'm debating right now is how to balance error tolerance vs. ease of finding problems and how we should ship VC#.  Should we ship with this error-intolerant behavior where bugs are immediately apparent but sometimes functionality is completely broken?  Great for finding problems, but if we don't provide patches in a timely manner then we're just making life painful for you.  Maybe we could use the SQM system to get notified about these problems so we can fix them up.  However, SQM is extremely limited in the information it can send so I don't know if it will be able to help here.   For now I think while we're in extreme bug finding/fixing mode it's very worthwhile to cut down on tolerance, and then as we get closer to shipping (but still have enough time to fully test everything) we go back to a tolerant mode.

This is definitely something I think we need to focus on more when developing things in the future.  So rather than blindly swallowing errors in order to give the user a pretty decent experience, we figure out how to get at that information (like Watson) so that we can fix the bugs as well.  Anyone have any experience with this sort of thing?  How do you get the best of both worlds for your users?