Command-line Trivia

So many people have wondered why the C# compiler isn't smarter about some of it's command-line arguments.  Here's a summary of the type of questions I hope to answer in this post:

  • Why doesn't the C# compiler accept /reference:*.dll?
  • Why doesn't the C# compiler accept /addmodule:*.*?
  • Why doesn't the C# compiler just automatically add all the dependencies? Or if I'm only using a.dll, why do I have to reference b.dll?
  • Why do I have to add /reference:MyAssembly.dll, but not /reference:System.dll
  • Why doesn't the C# compiler accept /resource:*.resources?

Basically it's because of our premise that our developers/users are smart and don't like too much black magic going on under the covers.  We thought that developers would always want to know exactly what their dependencies are.  By forcing you to put each dependency on the command-line you have to think up front what your dependencies are.  Wild-cards do not help that.

The second issue arises from lookup rules.  In the native world, C/C++ programmers use the LIB environment variable to tell the linker where to go looking for LIB files (The C# compiler also uses the environment variable to find references).  Classic VB users would just open the add component dialog box, which basically walked HKEY_CLASSES_ROOT for all registered COM/ActiveX type libraries, objects and interfaces.  The down-side to the VB model was that developers often had no way of knowing which components he or she had rights to redistribute.  So that is one reason why the C# compiler never looks in the GAC for references.  Our model is “GAC is for runtime, use someplace else for compile-time“.  That's why the framework SDK installs assemblies like System.dll into 2 places: the GAC for runtime and the SDK directory for compile-time.

Does anybody really fully understand fusion's complicated lookup/binding rules? Especially when you throw in machine, user, application and publisher policy?  I don't.  Every time they're explained to me, they all seem to have good reasons and make sense for applications, but not for developers.  Instead we went with a classical path-based lookup.  MSDN clearly explains which directories the compiler will search and in what order in this article.

As far as assembly dependencies go the compiler possibly could do better, but it would end up having rules very similar to fusion.  The simplest problem is that there is no guaranteed relationship between an assembly name and the actual file name for that assembly.  I think that's one reason why fusion has such complex rules.  Thus to look up second order dependencies the compiler would either need to rely on fusion (which we already decided was bad) or put in some black magic of our own.  The decision was to put the developer in change and force them to declare exactly which files to use as dependencies.

For module dependencies it gets even harder.  With module dependencies I suppose we could do slightly better.  However multi-file assemblies are extremely rare and the metadata doesn't help us out as much as we'd like.  Now unlike assembly references, module references actually contain the filename.  The problem arises because the ModuleRef table also stores other stuff, like all the filenames from DllImportAttributes.  Thus there's no easy way to determine the difference between a dependent managed module and a native dependency that's not part of the assembly (like kernel32.dll for instance).  Also when building an assembly it is possible that not all the modules will be referenced.  So again we decided to put the programmer in change and make you explicitly list all of them.

Many of you have noticed that you never need to explicitly reference Microsoft assemblies like mscorlib.dll or System.dll.  The main reason I guess is because we're self-centered.  Well I guess not completely because we do have some almost good reasons.

First I'll get to mscorlib.dll.  In case you haven't noticed there a few types the compiler just needs regardless of what you're compiling.  These include (this is by no means a complete list) System.Int32, System.Object, System.String, System.Delegate, System.Enum, etc.  All of these are in mscorlib.dll.  Because these always have to be there, everybody (except those actually building mscorlib.dll) need to reference mscorlib.dll, so the compiler tries to save you a little redundancy by always implicitly referencing mscorlib.dll.  If you don't want it, you have to turn it off with /nostdlib.

Now for System.dll and it's friends.  Here is where i think we might have gone a little overboard.  Even though we wanted you the developer to have total control of your dependencies, it became a little unwieldy to compile simple winforms apps from the command-line.  At one point we had up-wards of 40 different assemblies you needed to reference!  Since we already had response files we thought we make a 'default' response file that just referenced all of those assemblies, since everybody wants them (sarcasm intended).  So we created a default response file that always gets used in every command-line compilation unless turned off with /noconfig.  Please go look in your runtime directory right now at csc.rsp (it's right next to csc.exe).  Please take some time to edit it to suite your needs, it can save you a lot of typing!  Go ahead and add you favorite references, and remove the ones you don't want.  Also feel free to add flags to adjust the default command-line options.  You can always override them by using the opposite flag on the actual command-line.

The resources question is just going to have to wait for another day.  I think it really is worthy of it's own post.

--Grant