Update on root cause of Package Load Failures in VS 2005 beta 2

I got a chance to investigate a couple of machines from Microsoft employees who ran into issues after trying to uninstall VS 2005 and .NET Framework beta 1 and installing VS 2005 and .NET Framework beta 2. The machines I looked at had problems because Package Load Failure dialogs appeared when launching the IDE or creating projects in VS 2005 beta 2. I was able to fix the machines by doing the following;

  1. Locate all of the files in %windir%\assembly that had file versions starting with 2.0 or 8.0 where the 3rd part of the file version (not the assembly version) does not equal 50215 (the VS 2005 and .NET Framework 2.0 beta 2 build number)
  2. Open a cmd prompt and manually delete the files and folders that were in that list

The most common problem I saw was that there were 2 copies of the file Microsoft.VisualStudio.Shell.Interop.8.0.dll in the GAC, one from beta 1 and one from beta 2. For some reason, the beta 1 version takes precedence when devenv.exe launches and the VS IDE package fails to load.

I am working on compiling a full list of assemblies that exhibit this behavior and once I get that, I will update my cleanup tool to automatically remove these files. However, I probably will not be done until sometime tomorrow so I wanted to post a manual workaround in the meantime for anyone who reads this blog post. Hopefully I'll be able to save some drive reformatting and make beta 2 installation and usage a little less painful.

Technical details if you are interested

In both cases that I investigated today, the .NET Framework beta 1 was uninstalled before VS 2005 beta 1, and that caused VS 2005 beta 1 uninstall to fail to remove some assemblies from the GAC because fusion.dll is used to install/uninstall assemblies and since fusion.dll was installed as a part of .NET Framework 2.0 beta 1, it was removed by the uninstall and was therefore not present to perform the GAC uninstall during VS 2005 beta 1 uninstall.

This uninstall issue should cause all VS assemblies to be orphaned in the GAC. However, installing VS 2005 beta 2 should update the versions of the files in the GAC because they have higher file versions in beta 2 than in beta 1 (since the bug I described here has been fixed). But there is one critical issue that causes a few of the assemblies to not be updated. In the case of the file Microsoft.VisualStudio.Shell.Interop.8.0.dll that I listed above, it contained metadata that identifed it as an MSIL assembly in beta 1, but the MSIL tag was removed for beta 2 for some reason. So Fusion treats it as a completely different assembly and uses a different physical location on disk to store it in the GAC, and as a result the machine ends up with 2 copies of that file. I haven't gotten a complete list of orphaned files yet, but my theory is that all of the other orphaned files are caused by similar metadata changes.

Really in-depth technical details if you are really interested

This problem theoretically exists for VS 2002/.NET 1.0 and VS 2003/.NET 1.1, but there is a design change we took in the .NET Framework 2.0 that makes this problem worse. In 1.0/1.1, you will see the .NET Framework create a folder under %windir%\system32 named UrtTemp that contains copies of a few files that Windows Installer uses to install assemblies. These files are needed because the .NET Framework contains the code needed to install assemblies using the MsiAssembly and MsiAssemblyName tables, but it also needs to install assemblies itself (the classic chicken-and-egg problem).

In 2.0, we wrote a custom action to call fusion APIs directly to install assemblies because of a limitation in Windows Installer that would not let us use the DuplicateFile table if we needed to install files to the file system and the GAC. This liimitation had caused us to carry 2 copies of each assembly in dotnetfx.exe, which caused the download size to be unnecessarily high. In order to reduce size of the .NET Framework we had to use this custom action, which is why you see Assembly and AssemblyName tables in netfx.msi and do not see anything in the MsiAssembly and MsiAssemblyName tables.

Because we implemented this custom action, we were also able to eliminate the part of setup that copied files to UrtTemp. In the past, uninstalling .NET Framework would leave behind the files in UrtTemp so that assembly uninstall could be accomplished by using those copies of the fusion files. However, in 2.0 we don't use UrtTemp so after 2.0 is uninstalled, there are no files left behind to use for future GAC uninstalls. Visual Studio setup does not fail and rollback if GAC uninstall fails because we decided that if a user chooses to uninstall and files are left behind, we should not rollback and add all the files back to the machine. For a released product that is probably OK, but for beta versions, that can cause orphaned files that interfere with future releases of the same product as seen above.

The other complicating factor here is that the .NET Framework is a prerequiste for any application that is written in managed code (such as parts of Visual Studio), but we do not have any way of reference-counting the .NET Framework to prevent the user from uninstalling the .NET Framework out from under any applications that need it. Because of this, even though we know that uninstalling .NET Framework 2.0 beta 1 will cause orphaned files when you try to uninstall VS 2005 beta 2 later on, we cannot prevent the user from performing the uninstall. We have iterated over several proposed designs to create a reference-counting scheme but found too many confusing scenarios where we would pop up dialog boxes strongly suggesting that the user not uninstall, or scenarios where we block uninstall when we should not, or scenarios where 3rd party applications failed to increment reference counts because it requires extra work that was too obscure or difficult. So for 2.0, we will have to live with this type of behavior.

I suppose the ultimate answer will be close to what I always wanted to see - the .NET Framework is a system component that is not uninstallable. You see that on Windows Server 2003 where .NET Framework 1.1 is part of the OS and is always present, and you will likely also see this on Longhorn for .NET 2.0.