How Was Your App Compat and Visual Studio 2005 Project Upgrade Experience?

We have been spending a huge amount of time working on application compatibility between V1.x and V2.0 in the .NET FX, as well as Project Upgrade in Visual Studio 2005. As the Windows folks can tell you, this is a never ending task and very hard to get right. Often times bugs in user software are exposed by new versions. In other cases questionable usages of features can break you. And some times we simply have to break you (like security holes). I am posting some background information here, and I have some questions for you at the end.

Types of Compatibility and Common Issues

There are several levels of compatibility that we look at:

  • Binary Compat - the ability to run a compiled application unaltered on a newer version of the .NET FX
  • Project Upgrade - the ability to take an older Visual Studio project and upgrade it to the latest version
  • Source/API Compat - once you upgrade your project, does the source compile as is?
  • Browser Compat - Asp.Net looks to make sure their stuff works on all sorts of browsers (yes, that definitely includes non-Microsoft platforms and browsers)

Binary Compat

I will concentrate on the thing my team owns: binary compat for the runtime pieces. My ideal place to be for Binary Compat is that 100% of the code you write on V1.x will work unaltered on V2.0. The only place we knowingly stray from this goal is security: we will cause a breaking change if it makes your computer and the net more secure. In any case where we must make a change like this we will of course document it to the best of our ability. It is a decision we take very, very seriously. 

Here are some types of issues we've seen in user applications:

Private Members - applications that grab private members (interfaces, methods, or fields) from .NET FX classes are not guaranteed to work in the future. As we make improvements and innovate in the code, a developer may totally throw out internal algorithms. These private members are only exposed through Reflection for purposes like serialization. And features like serialization must be hardened against this kind of change in state.

Corruption - every once in a while you will find a piece of code that uses P/Invoke to call out to unmanaged code. If the signature is incorrect and causes un-initialized data to be used in your program, subtle changes in stack lay out can cause you problems. I highly recommend using sites like www.pinvoke.net to ensure you are using the proper declarations, and of course always use the fully managed alternatives.

Versioning Logic - the most common app compat shim in Windows itself is to lie about the OS version number. This typically happens because some piece of software is doing an equality check that can't handle a newer version. So for example, if your app can run on V2.0, will your installer actually let it? You may have explicit reasons for not allowing it, but make sure it is intentional.

Version Tolerant Serialization - if you use Remoting, you will want to be aware of VTS. This video from Matt Tavis explains many features including VTS (seek around 4:45 and 19:30 in the demos). This feature (which can be applied to V1.x code) allows you to remove brittleness around the types you can serialize/deserialize. If your application moves forward to Whidbey, this could otherwise impact you.

Hard Coded Paths - I've seen installers hard code their path information to versions of the .NET FX under %windir%\Microsoft.Net\Framework using a precise version. This will break if that version is not installed on the machine. Also, if you update config file settings in one version but wind up running in a different one, those settings are not present.

I have asked the PM team to prepare a talk for TechEd this year to cover issues like this. After TechEd is over and the content is available, I'll post a link to it.

Breaking Change Protocols, Challenges, Solutions

I see press coverage periodically (especially around new SP releases for Windows) that seems to indicate Microsoft in general is lazy/inconsiderate/sloppy around compat, which honestly bugs me quite a bit. Consider that any change I check into the build could be breaking for someone. I have seen existing race conditions in software exposed by actually making the engine run faster!

The general solutions for these kinds of problems usually fall into a few categories.

Virtualization - Systems such as Microsoft Virtual PC allow you to create a simulated PC right down to the hardware involved. This kind of approach is great for basically eliminating compat issues. You literally run your application on what appears to be the original machine + software stack. The problem with this approach is that apps really like to share data with each other. For example, dragging and dropping a spreadsheet object from a VPC into a document I'm writing outside that environment would be very tricky. Getting the communications protocol, security, etc figured out in this case is non-trivial. However this kind of system has worked great for decades in the mainframe/mini world where you typically run batch jobs.

Side by Side - Major versions of the .NET Framework run side by side with each other. That is, you can have V1.0, V1.1, and V2.0 all running isolated from each other on the same machine (each must be in its own process). An application will by default use the version of the .NET FX it was built against. I've seen very good results with this approach.

I should point out that hosts can utilize this support as well. Asp.Net, for example, will only load the version of the .NET FX that your site is designed for. You must go into the Asp.Net configuration support to explicitly roll forward your site. The same is true for MS SQL Server.

Suzanne Cook's blog has a lot of material on how to use side by side in the framework. You can use a .config file for example to help explicitly control how your application is deployed and how it loads.

Protocols - We do a tremendous amount of due diligence around breaking changes. We have automated tools that walk all the public API's for each new build and look for accidental API changes. We revert those quickly to avoid ever shipping them by mistake. In the case where something may be breaking, the change goes through a "Compatibility Council" (three experts with lots of experience in this area) who review it to ensure it won't have an impact and is the right thing for the user. This then ties into documentation you will see in MSDN, etc. It's a lot of work but we believe very strongly we must do this.

Test the Heck Out of It - The final approach we use is sheer testing. We try to get our hands on every application we can find and run it in multiple scenarios. We do code coverage runs to make sure that as much as possible of the code base is hit. But even with the best intent, the surface space of something like the .NET FX is huge. We can always use community help here to ensure we are getting thorough coverage.

We recently held a compat lab here on campus where 19 customers came in to test their code on the new system (at least one participant blogged his experience, day 1 here). Getting in a new set of applications was very helpful in identifying issues. It is especially helpful to see departmental code we would otherwise never get a chance to cover. In two weeks Brad Abrams, Kit George, and a few more of us are headed to Atlanta to meet with customers and get more of this kind of feedback in person.

The Developer - You are a big part of getting compat right! There are a few aspects:

  • Make sure you design your code expecting new versions of everything. Expect that you might wind up on a version of Windows running V5.0 of the .NET FX some day in the future.
  • Same thing goes for your own objects. If you are serializing (either through remoting or through persisted data), make sure to look at the guidelines like using VTS to make sure your code will keep working.
  • Be very wary of hard coding paths to registry data or file systems. You can almost always find a more general way to write that kind of code that avoids it (such as querying for the version you are running).
  • Do all of the code cleanliness techniques: (a) scan with FxCop for any issues, (b) after upgrading your project, make sure to eliminate all usage of API's that have been marked obsolete, and (c) run with MDA's to catch non-deterministic or questionable constructs in your code.

Let's Talk Add-Ins

I have left Add-Ins out of this entry until now because they are the trickiest to get right. There can be only one version of the .NET FX loaded into an OS process. Once that version has been loaded, it cannot be unloaded. For a typical managed application, that is no big deal: you load the one you want to work with and everything is fine. As I already mentioned, Asp.Net, SQL Server, and your own managed .exe work this way.

However for a generic unmanaged host application, things are far more complicated. These hosts go by the name of IE, the Shell, Outlook, Word, Excel, svchost.exe, etc. The default behavior when an unmanaged application loads a managed add-in (like an in-proc COM control) is to pick the latest version of the .NET FX and load that version. This allows your newest controls to run in the process, and with a high app compat bar, your old ones as well. If we loaded V1.1 for example, then there would be no way to run a V2.0 based add-in which might be using generics as a feature.

Unfortunately, side by side doesn't fix this problem because it refers only to allowing more than one version of the .NET FX to run on the machine (but always in a separate process). Take for example two processes that are using .NET Remoting to communicate. Process A is a managed .exe built with V1.1. Process B is an unmanaged host which loads a managed Add-In. Process A will serialize in a V1.1 format, while Process B loads V2.0 getting the updated format. Because the host was not utilizing VTS, a failure occurs. In this case if Process B was totally under you control, you could simply add a .config file for the process to lock it explicitly back to V1.1 no matter what. But if the process were IE, you wouldn't want to do this because it means that no V2.0 add-ins could ever be loaded.

Besides the config file option and full out testing, you can also decide to run your Add-In code out of process. That would allow it to choose its own .NET FX stack to run against. If you do this, you of course need to make sure your communication protocol with the in-process piece is versioning tolerant.

I should also re-emphasize that as an application writer, you have control over this behavior. So for example, this isn't a problem in the case where you build a departmental style application that uses a Grid and Calendar control. Choose what you want to run with and deploy it that way and you will always be fine. Ditto for an Asp.Net site: you choose what .NET FX you want the site to run against, and pick when you want to upgrade it to the next version.

Questions For You

Now I have some questions for you:

  1. Have you tried your V1.x binaries on V2.0 to see if they work? How did it go?
  2. Ditto for Visual Studio project upgrade from 2003 to the 2005 version.
  3. I have been considering starting a team blog on Compat and Upgrade issues. I think it would give the community a good place to ask questions, see common issues and their resolutions, and get/give feedback. Would you use this resource if it existed?
  4. How long do you think we should leave an obsolete API in the system to ensure app compat? My assertion (which is hotly debated) is 'forever'. The goal of 'obsolete' is to tell a developer when there is a better (and sometimes more safe/reliable) way to accomplish a task.
  5. Are there any other hot button issues you have around this topic we can give you more information on?

We'd Like Your Help

Finally, now that Beta 2 has gone live I'd like your help in making sure Whidbey meets your needs. As with any pre-release software, you should take proper caution with your machine and read through all of the release notes. If you are at all uncertain, you may consider using a test machine or a VPC image, rather than your primary development machine, to do the testing.

The things we want to try include:

  • With your application already installed on the machine, install Whidbey and make sure your app continues to work fine.
  • Assuming your application is using V1.x, force your application to run on 2.0 and make sure it still works. The easiest way to do this is by dropping a .config file next to the .exe with <requiredRuntime version="v2.0.50215" /> in the <startup> section. (note that v2.0.50215 is Beta 2)
  • Try installing your application on a machine that already has Whidbey on it (make sure your own install logic doesn't fail)
  • Upgrade your project in Visual Studio. Were you able to successfully get it recompiled on the new version and executing? You may hit some "by design" issues when recompiling, such as new keywords or ambiguous overloads. More data can be found on MSDN for these cases.

If you do find issues, we'd like to hear about them. You an do this on the MSDN Product Feedback Center (aka Ladybug) by filing a bug or suggestion with [Category = Compatibility]. We comb through this data routinely looking for issues that are blocking you. 

We are also very interested in getting copies of applications you are willing to share with us. Jay Roxe has a detailed explanation of how to do this on his blog. We would be happy to run these in our regular test passes to help make sure your software will work on new versions we come out with.

More Data

Thanks for making it this far. You can find more data at these locations: