Microsoft Technology Summit Notes on the CLR

Article
03/17/2005

This morning I met with a group of industry influencers who are more exports in the JVM community. I brought with me a few folks from the CLR team, including Jim Miller, Brad Abrams, and Jim Hugunin.

Rather than do a lot of presenting (I had only 9 slides including the title page), we spent most of the time fielding some really good questions. I wanted to jot down a few additional pointes here:

Garbage Collection

There were lot's of questions on how our collector works. This article by Rico is a great resource for full details. A few notes:

We do have both a client (built for interactive responsiveness) and server (built for CPU scaling) version of the collector.
Our collector doesn't use handle indirection for references, so if you write a for() loop we will enregister the reference and do all the heavy lifting to track/update the roots ourselves.
Our philosphy is not to have a ton of knobs on the GC. We think that for many typical programmers it is too easy to get into trouble. The response from the audience was two fold on this point: (1) they generally fealt that the JVM had overdone this with far too many switches and developer configured settings, but also (2) some developers do want more power. We do support advanced settings with our hosting API's if you need them.
There was a question about whether we were adding the ability for developers to plug in their own GC routines. We are not planning to do this now or very likely ever. These kinds of customizations can make it harder to write a successful tuning GC (changes heuristics too much).

Memory Overhead and Sharing

I walked through how an Assembly is found on the machine, how the CLR is bootstrapped, and how we use the GAC for versioning work. A few intrestesting notes on this:

At runtime, any IL, Metadata, etc in your Assembly will be shard by Windows using the normal PE loading techniques.
We have the NGen compiler which allows you to pre-compile your code and share pages at runtime. I have a long article on this here on this subject. This means that NGen'd code is also shared between processes.
In practical terms, it means if one CLR app is already running on the machine, the next one will start up a lot faster because the system is "warm". You are running the same native code for System.String in every process and don't have to wait for it to be recompiled.
Parts of the runtime metadata (v-table and field layout information used by the engine) can also be shared with NGen'd processes.
If you JIT compile, then the code is per process, as is the runtime metadata.
We've done a ton of work in V2.0 to further reduce managed code overhead. This is really a big win for client applications.

Memory Model and Volitility

A question came up about how we handle aggressive memory scheduling on various CPU's. The CLR does support the concept of volitility and read order semantics. This is another example of a complicated trade off based on developer segment. Lots of people would be very happy to not ever have to understand the concepts involved. But other segments are going to be looking for every ounce of performance and want access to those features. Our high level principle is to make the system easy to use and program against by default, but not to totally hide things either.

Versioning of Assemblies

Versioning is a very tricky subject. A few notes we talked about:

We support side by side CLR's on the machine, but only one per process.
We capture strong versioning information at compile time and store it in the metadata.
At deployment time, you can place your strongly versioned assemblies into the Global Assembly Cache (GAC) where we can find them later.
At bind time, we can look at your reference and the version information to figure out which one you required and load that version.
We do support "unification" for the core .NET FX libraries themselves. This allows you to get one consistent stack at the base.
There is a lot of good lower level material: on Suzanne's blog and Junfeng's blog.

Dynamic Languages

We had several questions about our philosphy on dynamic languages. I think I'll actually write a separate blog post on this one because it is such an interesting subject. A few quick notes for now:

I believe we already have a great platform for dynamic languages. You get the GC, exceptions, interop, and access to a huge set of libraries for free when you taget the CLR.
In Whibey we add Lightweight Code Gen (LCG) which further eases the ability to author the languages.
Because we also built generics deeply into the runtime, you can leverage that support dynamically at runtime without having to "fake it out". This is a distinct advantage over techniques that use "erasure" to replace syntax with expanded code.
Because you target our MSIL and type system, we also enable easy access to tons of tools that work across langauges seemlessly.
I expect in the future to do a Project 7 style approach to adding more improvements to the engine, just like we did in the late 90's to form the current engine.
And finally a quick plug for Jim Hugunin who is doing a keynote at Pycon next week.

In closing, I want to thank the attendees for their great questions and feedback!

Microsoft Technology Summit Notes on the CLR

Additional resources