The kind of bug I like to see

The other day a customer reported the first "real" bug in Visual Studio Tools for Office. I put "real" in quotes because there are several "known issues" with the product (installation, non-default locale calls, etc) but these are what we call "by design" things -- it works the way it is designed to work, but that's not necessarily the way the customers (or, in some cases, the Program Managers / Testers <g>) expect it to work. We generally try to fix these types of bugs when it makes sense (by changing the design, obviously), but often "by design" bugs are not bugs at all.

For example, consider the bug report "When I rename my VSTO assembly from WordProject78.dll to ResumeBuilder.dll, it fails to load". The customer thinks it's a bug, because it's not the behaviour they were expecting, but it's by design and won't be fixed. The main developer on JScript .NET (not Eric Lippert, actually, although he certainly had a lot to do with it :-) ) used to long for a bug resolution of "By Bad Design", meaning that the software was working to spec but the spec itself was bad.

Anyway, back to the story... on this thread you can read about the bug that was found by a customer. I'll refer to it as "a bug caused by security measures" rather than a "word-beginning-with-s-and-ending-in-ecurity bug" because otherwise some people (<cough>journalists</cough>) will assume there is some kind of "hole" in the product. There is not (that we know of :-) ).

The problem is that we are a little bit overzealous in our security enforcement, and have accidentally revoked too many permissions from some specific types of auto-generated code. (Actually, to be more accurate, there is a problem in the way that some CLR components are implemented, and these problems manifest themselves as security exceptions in VSTO projects. VSTO itself is doing nothing wrong).

I need to share a bit more here.

A short while ago I wrote a blog about the VSTO security enforcement algorithm. In that entry I told the truth and nothing but the truth, but I didn't quite tell the whole truth. It was not a conscious decision -- honest! -- but simply a slip of the mind. What I described was the Platonic Form of the VSTO model in all its glory. The way the system would work in a perfect world. The way the system was originally implemented.

But of course the real world is not perfect, and just as you or I cannot draw a circle, we cannot implement the VSTO model without some slight modifications. One interesting modification which I'll mention for completeness' sake (not because it has anything to do with this blog, but because I hope not to have to write yet another addendum to VSTO security in another blog) is that in addition to cloning the persistent policy levels, we also add the list of FullTrust assemblies to the AppDomain policy. This is basically a list of assemblies that actually implement the security policy, so they must be granted FullTrust without reference to the normal policy hierarchy to avoid circular references (MyCustomCodeGroup is in an assembly that relies on SomeOtherClass, but SomeOtherClass is in an assembly which relies on matching MyCustomCodeGroup in order to be granted execution permissions). In fact, we have a doc bug in VSTO because we tell users to add msosec to the GAC, but fail to mention that it should also be added to the FullTrust list (this doesn't end up being an issue for practical purposes, because nothing else depends on msosec, but it's a theoretical bug). It's all my fault, and I'll see that it gets fixed for the next release :-)

Back to the real issue again...

So the second thing we do to policy in VSTO, and the reason I'm writing this blog, is that we add a couple of additional rules to the cloned policy levels in order to fix up a few assemblies. And these are kind of important assemblies too, like, say, the Office PIAs. The problem is that default machine policy trusts the Microsoft strongname and the ECMA strongname, but not the Office strongname (yes, Office uses a different key to sign their assemblies than the CLR / VS teams use). And Office setup does not add their key to policy, relying instead on the default MyComputer: FullTrust rule that applies to most kinds of applications.

Except, of course, VSTO applications (remember we turn off permissions granted by MyComputer Zone evidence).

So we add a special rule to the AppDomain policy that grants FullTrust to the Office strongname, thereby allowing the Office PIAs to execute. Now before anyone jumps up and down and says "What if I don't want to trust Office?!? You bad monopolistic bad untrustworthy bad people. I should be in control of my security policy, not you!" remember that the AppDomain policy cannot grant permissions to code that are not already granted by Enterprise-, Machine-, and User-Level policy. So if you choose to explicitly deny permissions to the Office PIAs in your persistent policy then our AppDomain rule will be thrown away and VSTO solutions will fail to work correctly (which was presumably your motivation for denying permissions to Office PIAs).

The other problem we found was that web service calls -- one of the cool things you can do with .NET -- didn't work under VSTO. Ooops, that's kind of bad. I can see the headlines now:

New Microsoft Office System 2003: Now with built-in .NET support; can't call web services!

The problem here is that in order to work with a web service, the CLR generates and loads a small "helper" assembly at run-time that is used to massage the data passed to and from the web service via XML. Note that this is different to the web service proxy generated by wsdl.exe or the Visual Studio "Add Web Reference" dialog, which essentially gives you a strongly-typed wrapper around weakly-typed invokes into the web service runtime. The dynamically-generated proxy is built and loaded by the XML Serialisation runtime, and of course since the proxy is a .NET assembly it needs permission to execute, and in order to be granted permission to execute it must have some suitable evidence. But what evidence to you give to a temporary dynamically-generated assembly? Good question, and there's not necessarily a good answer (discuss <g>).

Anyway, to cut a long story, errr, long, the proxy is loaded with three pieces of evidence:

1)
MyComputer Zone

2) Full path to System.dll in the GAC

3) Unique hash of the assembly

Ordinarily, the first piece of evidence works, but the other two are useless because there are no rules for either of them in default policy. And since VSTO doesn't honour MyComputer zone, that means that the proxy can't load, which means the web service call can't be made. Solution? Trust one of the other pieces of evidence. Obviously we can't trust the hash, since that is unique per assembly and so we can't pre-compute it. But we can pre-compute the full path to System.dll, so that's what we do. We add another rule to AppDomain policy (along with the Office strongname) that grants Execution permission to System.dll.

And herein lies the problem that I set out to describe at the start of this overly-long entry... turns out that whilst web service calls only require Execution permission, if you try and do XML Serialisation inside a VSTO solution then it also generates dynamic proxies, and those proxies need more than just Execution. My guess is that they need at least SecurityPermissionFlag.SerializationFormatter (note the incorrect spelling ;-) ) but it could really be anything.

This problem is caused by an overly-aggressive invocation of the "run with least privilege" mantra -- don't grant code more permissions than it needs to execute. We could have simply added the a rule to the AppDomain that said System.dll : FullTrust, but that seemed "wrong". We had observed (and verified with the web services team) that the proxy only needed Execution permission to get its job done, therefore giving it anything else was superfluous and a potential source of future security bugs (in the pejorative sense).

Now in the grand scheme of things, and as I've mentioned many times before, the AppDomain can't grant permissions that aren't otherwise granted by persistent policy, and realistically if System.dll isn't granted FullTrust then not much is going to work on your machine, so it would have been OK to grant System.dll FullTrust in our modified policy. But it's the principle that counts. Vigilant, consistent invocation of security practices is the best (only?) way to build secure software.

Luckily, there is a fairly straight-forward (if a little annoying) work-around for the problem -- simply add a rule to policy (at any level...) that grants FullTrust to the full GAC path of System.dll. Voila!

The reason I am "happy" about this bug is that all software has bugs, and I've been dead scared for the past year or more that there would be a security bug in VSTO. The fact that a bug was found, and that it was related to security, but that it was a "fail closed" (block good things) rather than a "fail open" (allow bad things) bug, means I might be able to sleep a little better at night. For a while.

P.S. My friend Jeff "Won't Fix" Davis asked to be mentioned in this blog. There you go, Jeff! I didn't even know he had a blog until right now...