Multi-file Assemblies: What and How

I intentionally left out the 'why' because only you as developer on your own product can decide why a given programming feature should or shouldn't be used.

First the easy part: What

Multi-file assemblies are simply assemblies that consist of more than one file.  That begs the question of what is an assembly.  Although I've seen several very amorphous descriptions of what an assembly is  logically or theoretically, from my pragmatic compiler background I prefer to describe it how I see it literally.  An assembly is a grouping of 1 or more files, where exactly one of those files contains an assembly manifest.

Now I know that there exists such things a native manifests, but in this case, I'm referring to the AssemblyDef record in normal metadata.  If you run ildasm, it will be the details of the tree node named “MANIFEST”.  There is a section that begins with “.assembly 'SomeName' { ... } “.  That same manifest will also contain a list of FileDef records that list what other files physically constitute the assembly.  The interesting tidbit here is that the files within the assembly that do not contain the manifest also do not contain any back-pointers to the assembly that they 'belong' to.  Thus a single file can be a part of several different assemblies, but don't do it because it will cause tons of other problems down the road.

How:

There are several ways to create a multi-file assembly.  None of them can be done inside the IDE with C# or VB.NET without special build rules or custom actions.  (Yes C++ can do it inside the IDE).  So from here on out I will assume everybody knows how to use the command-line tools.

First you need to build a file without an assembly manifest for C# and VB.NET just use “/target:module”.  The next step is to include that module into an assembly.  This is where things get a little tricky, and due to my lack of complete knowledge, I'll limit myself to C#.  First of all there is no tool that MS sells that will allow you to modify an assembly once it has been built.  There are type identity issues, code-signing issues, etc.  You always need to save building the manifest file (the file that contains the assembly manifest) until the very last step.  You can include an existing module into an assembly as you build it by simply adding “/addmodule:<module name>” to your command line.  However the compiler is just plain not as efficient as it could be if you did things slightly differently.

Did you know the C# compiler allows you to produce multiple output files in a single compilation?  It does.  The gotcha is that it will only produce one assembly, so if you have multiple “/out” or “/target” switches on the same command-line, you're building a multi-file assembly!  This is logically the same as doing the 2 step “/target:module” and then “/addmodule:”, except the compiler is slightly more efficient in some of the metadata.

If you're really desperate to show off your geek-ness, you can build a multi-file assembly using al.exe, but it's not worth it.  See my previous post on why you should use csc.exe instead.

Why:

OK so I changed my mind, I will give you a few possible reasons for why you might want to use multi-file assemblies.  Just remember though that these are just possible reasons.  You still must think and decide for yourself in your situation!

Multi-language assemblies: this I think is the biggest possible reason.  Maybe some of your programmers only know VB, maybe you need to interop with some old C/C++ code.  The list goes on forever because there are an infinite number of reasons why you might want to write part of your application in different languages.  Now you could put each language in its own assembly, but then you get stuck with a poor organization model, and the choice of making certain things public when they really shouldn't be  and trying to compensate by using fancy security or obfuscation.  As I pointed out in another post, this reason is going away (now you can have multi-language single-file assemblies)!

Incremental Download: back when we were still trying to figure out what exactly an assembly was, this was the poster-child example for multi-file assemblies.  Now I think it was really more of a straw-man.  Either way I'm not even sure we actually support it, but the original idea was that in a Internet or web-based application you would put the 'common-use' code in one file and the less-used code in another file.  That way user that only needed MyReallySmallReallyFast90PercentCase functionality got it, and weren't punished with long download times for MyReallyBigAndSlow10PercentCase that few people needed.  The app would download the small part, start running, and only download the other files (that contained the potentially bigger, and less frequently needed code) as needed.  It seems like a great idea on paper, but now that more and more people have broadband, I seriously question its value.

Linked Resources: the idea here was that you had some file that was logically part of your assembly, but needed to be it's own file for some reason, possibly because some native API needed to read it and only accepted a file name argument.  Possibly because the file was itself a native DLL.  Possibly because you want the user to be able to modify it (like a splash screen bitmap or something).  At one point in the design we even had a flag on the FileDef record indicating that a given file was 'modifiable'.  I don't think that flag survived, and I know there's know way to set it with C#, VB, C++, or ALink.  This I think actually has some merit, but it is definitely not common.

If you've got another good reason for multi-file assemblies, speak up, I'd love to hear about it.

--Grant