Some thoughts on the Global Assembly Cache and the compatibility of assemblies

Enormous Caveat --- what follows is my opinion and my interpretation of the opinions of people around me, it does not represent the advice of Microsoft or the .Net Framework team.

Over the last month I have been embroiled in a series of long and interesting email threads about whether assemblies should be added to the GAC or not.  This has highlighted that the advice that we have given up to now may be incomplete.  We haven't yet concluded the discussion and so the ultimate advice may well differ from the conclusions I reach in this article.

The thrust of the question is under what conditions should an application's assemblies be placed in the GAC?; additionally what are the compatibility burdens placed on the author of an assembly who includes public types and members in an assembly.  

The current advice of record is that the Global Assembly Cache should be used for assemblies that are shared between applications,  I.e. used by more than one app.  This yields the benefit that less disk space is used since each app references the same assembly, and also the installer can manage the number of applications that add the assembly and ensure that it is only removed when the final application that depends on it is uninstalled, and the ability to centrally administer assemblies is gained, the GAC is only be modifiable by an administrator by default and so assemblies are safe from tampering by normal users.   Another benefit is that the CLR can load assemblies from the GAC a bit faster than other locations.

You should note that assembly load is a one time event and so any performance benefit is probably tiny and not affected by how many time an assembly is loaded because the CLR satisfies subsequent load requests from the already loaded instance of the requested assembly. AppDomain unloads when the assembly is not shared between app domains will cause the assembly to be unloaded.

  1. The first optimization is the for the CLR to look into the GAC first when resolving an Assembly reference.  If it doesn't find the requested assembly in the GAC then it looks in the application directory and follows the rest of the probing logic algorithm.  Clearly there is a performance gain by putting the assembly in the first place to be searched.
  2. When the CLR loads a strongly signed assembly it computes a hash for that assembly and compares it with the hash that was computed at compile time, or when the strong signature was applied.  If the hashes match then the CLR continues the load because it is sure the assembly was not tampered with.  When loading from the GAC the CLR doesn't need to do that, since the hashes were compared when the assembly was added to the GAC, and the assembly couldn't have been tampered with, except by an administrator who could have done much worse things to your PC just by virtue of being an admin.  Not recomposing the hash is beneficial because doing so touches the whole file which causes it to be loaded into memory potentially causing the virtual memory manger to page out information that may be accessed again shortly.  When loading an
  3. We can cache the resolutions to native images for assemblies that are loaded from the GAC and compiled using ngen.

These optimizations are sufficiently compelling to some people that the decision to store an assembly to GAC is not as simple as it used to be.  In the past it would be based solely on the intent to share the assembly between apps, or to satisfy a more complicated binding logic, for instance to make use of multiple versions of an assembly in a single application, or the type needed to be loaded in different AppDomains with separate AppBase directories.  Now the performance consideration is brought up more and more frequently.


Above it can be seen that GAC has the emergent property of being a performance accelerator - large or small performance enhancements are always welcome and therefore will likely be taken advantage of.  Which brings us to the issue that has been consuming our email bandwidth namely what are the compatibility issues that arise from extending the use of the GAC beyond shared assemblies.  By compatibility we are not talking about the policy that causes different versions of the assembly to be loaded, in this case it is more like source code compatibility, if an application is recompiled with the new assembly will it require any modifications to run correctly.  Clearly if compatibility is high then policy could be used to bind applications to the newer type but that is not the issue here.

A common understanding of developers I have discussed this with is as follows:

  • If my intent is for the assembly to be shared then any public or protected APIs that are exposed form a contract; it is my responsibility to ensure that the contract is honored in future versions.  When it was pointed out that any application that bound to the earlier version would still use the earlier version even in the presence of the later versionof the assembly it was still agreed that the original assembly created an expectation in the minds of customers that the contract should be kept.  This would mean that an API in an early version of an assembly must be present in subsequent assemblies with the same semantics unless decorated with the deprecated attribute.
  • If my intent is for the assembly to be an implementation detail then any public or protected APIs that are exposed do not form a contract and that no expectation is created in the minds of anyone about future versions of that assembly.  This would mean that an API in an early version of an assembly need not be present in subsequent assemblies.

We also considered the possibility that any public or protected API in an assembly created an expectation that it would exist in future versions of that assembly regardless of whether it was an implementation detail or part of a framework.  Everyone agreed  that this places a high burden on the developer of an assembly.  Some argued that even though the burden was high, it was still right because the developer of the assembly had exposed those APIs as public or protected.  In released versions of the .Net Framework it is pretty hard to minimize the public contract of an assembly but Whidbey has friend assemblies and so the public contract of assemblies which are part of a larger application can be minimized or even eliminated

One thing is certain, none of the current machinery of the CLR has identified a foolproof mechanism where a developer can identify an assembly as an implementation detail or a framework assembly.

It was suggested that because the GAC is for shared assemblies and that multiple versions could exist in it then a strong argument exists that by placing an assembly in the GAC then a developer is making a strong statement about future compatibility of the assemblies.  Assemblies not installed in the GAC could be considered implementation details whose future compatibility was not guaranteed or even likely.


Because of the changing nature of GAC it is probably no longer safe to assume that assemblies in the GAC may be reused at will.  Perhaps it never was safe.  It is also not safe to assume that an assemblies API set will remain constant unless the developer has specifically declared that to be the case.  A developer of any assembly should minimize the public API set to reduce the compatibility burden on hime for later versions of shipped assemblies.

The following recommendations are those that I think are prudent, I expect that by the time Whidbey is released the CLR will have published a comprehensive set of guidelines, they may well differ from these.

Recommendations:

  1. When consuming a third party assembly, examine the license agreement or any documentation provided by the author describing the conditions that apply to the reuse of that assembly.  If they haven't explicitly said that it is reusable, then you should avoid reusing it.
  2. If it is a reusable assembly, see what statements about future compatibility plans for that assembly the developer is prepared to make.  If the developer has not stated that future versions will maintain the current contracts to the best of their ability you should consider whether you wish to take that dependency.
  3. When developing an assembly regardless of your compatibility intent:
    • Keep public API's to the bare minimum because:
      • It will be the easier to produce future versions that are compatible
      • There will be fewer unintended dependencies taken on your assembly
      • There will be fewer potential security holes
    • Always set the ComVisible attribute to false if the assembly is not intended to be used from COM.
    • Never add the AllowPartiallyTrustedCallers attribute, unless it is required and you have performed the necessary security checks on your code
    • Consider adding a LinkDemand for StrongNameIdentityPermission to limit the callers to those assemblies signed with a specific key when the intent is not to share the assembly.
  4. When developing an application that consists of multiple assemblies make all of the types internal and use the new FriendAssemblies feature in Whidbey to restrict the assemblies that can consume them.

 

Until next time

 

Kevin