Why should your language target the CLR?

In my post on a meeting with some JVM experts, I mentioned a few high level notes on our dynamic language support. James Barnes asked for a response to Patrick Logan's article on the subject. Patrick points out my post wasn't deep on details, and could really be looked at as trying to force types deeper into typeless languages.

First, I'm very glad to see the interest in Dynamic Languages in the context of the .NET Framework. I'm very excited about the growing set of developers using dynamic languages being able to further leverage Visual Studio and the rich libraries of the .NET FX as well as new technologies like Avalon and Indigo. Second, I should clarify that my remarks were geared more towards a compiler language author rather than a language user (an important distinction which determines what a user sees in the language itself).

The Managed Environment

Managed languages are typically made up of three parts:

Compiler

The language implementation targeting the Engine.

 

Platform Libraries

The Platform Libraries are for accessing databases, networking, and the usual kinds of features you need to be productive as a programmer. Ideally these are written using the language itself and target its instruction set, but often they are written in unmanaged code with an interop layer.

 

Engine

The engine functionality probably includes some kind of collector, support for exceptions and language intrinsics, potentially support to interop with unmanaged code (be it native C or COM or whatever), etc.

A programmer typically lives in the world of their compiler (often through an IDE) and the platform libraries. Ideally the machine level features aren't in your face. For example, you expect to allocate objects and use them without having to know every nuance of how the collector itself works. Ideally the language gives you a nice way to deal with its constructs (like the using clause in C#). We're not always as successful as we'd like to be, but it is a general goal.

Why Target the CLR When You Already Have a Language VM?

I'd like to see dynamic languages used in more places. Stand alone utilities like BitTorrent are cool. But can I utilize the technology in other places, like plug-ins for major applications? Yes, this is possible to do today. But here's the world I would hate to see: a commonly used host application (like a browser) winds up with multiple garbage collectors competing for memory and machine cycles to collect it, tons of duplicated platform libraries clogging up working set, basically making the app a pig.

I believe that while most compiler writers are interested in garbage collection algorithms (hey, who wouldn't be?), I do think they would like to spend more time innovating in the language. If you author a compiler that targets the CLR, you've extended your development team to include ours working on your behalf. I believe that user's of your language get the win as well, through access to a rich set of new platform libraries and better performance.

The next conversation you can have is a blow by blow comparison of features the CLR provides vs what the language environment itself had already. I would rather work with the language authors on that and get their feedback first hand. That is why I was so excited about getting Jim on the CLR team in the first place: it brings a world class expert to help us accomplish the goal.

Access to Platform Libraries

It is obviously important to a language user to get access to the libraries they already know, and to which they've already written a lot of code. For those libraries which target the language itself, this should just work. For example, IronPython runs those libraries which are written in Python. Libraries which are written in native C/C++ are trickier, because their integration with the system may be tightly coupled to the machine level and its programming interfaces.

The next thing that is important is having a rich set of platform libraries that you didn't have easy access to before. We bring you that with the .NET FX. The string object you use in your language is a System.String underneath, and is the string you'll use in C#, VB.Net, or COBOL. Finally, you get access to new managed libraries that vendors like Microsoft are writing (beyond the already rich .NET Framework). Avalon and Indigo are two examples of that reach. Jim did a demo at PyCon showing an IronPython Avalon application.

This notion of composing an application from components written in multiple languages has been a core tenant of the CLR since day one (duh, spell out the acronym).

Tools Support

Once you have a language which targets the CLR, you can take advantage of a ton of great integrated tools support. As an example, when you write Python code which utilizes libraries written in C# (VB.Net...), Visual Studio will give you integrated call stacks between the languages automatically. It's all just managed code to VS. Check out the Nemerle project for another non-Microsoft example. The same thing applies for profilers, static analysis tools (like FxCop), etc. I will also point out that there is a rich set of tools from non Microsoft partners as well.

Have rich tools existed in the past for other environments? Of course! But those tools typically only support a specific language. For example, if you wanted to use code analysis to enforce a set of coding guidelines for your organization, you might have had to use a lint variant for C/C++, and may or may not have had a tool for your VB6 UI code. It's the integration that works across all the code your organization owns that is exciting, not the fact that we have tools in the first place. Having the entire Visual Studio team and a huge user community working on tools that will work with your new language is just a bonus.

Are We Forcing Types on You?

They call them typeless languages for a reason. The compiler is the first key towards making this possible. The Python code you wrote yesterday will work on IronPython today unaltered. The first demo in the IronPython readme is "2 + 2", not "int x=2; int y=2; print x+y;". You aren't required to add strong typing into the language. 

However, objects on the CLR do carry some type information (called the “runtime type”). The types act as a barrier to what can be done to an object at runtime. But that is because a secure platform such as the .NET Framework really must hide some details of object representations from client programs. Dynamic languages do have to take this into account, as they already have to when interacting with an operating system or with code written in other languages (e.g. C). This means some things may need to happen when a dynamic language calls a .NET component: nearly all the time a dynamic language will help the programmer by automatically converting data to the necessary format, but sometimes the programmer may need to ensure that the data is in the right format in the first place. 

That said, I do agree that there are improvements we can make here so that typeless languages feel very natural in the environment and tools. There is work to be done here and I'm looking forward to working with the community to get there. Your feedback is very important to us!

As an aside, projects like F# and Nermele are beginning to show that even statically typed languages can be essentially typeless, in the sense that you don’t have to write lots of types in your source code: these languages can reconstruct the types for most of your code without having type annotations. 

Details on Whidbey (V2.0) Features

You've asked for details on what we've added which assist compiler authors. Here are some examples:

Extensions to Delegates

- We have a new feature called delegate relaxation. You can now call on a delegate which points to a method which is covariant on the return type and contravariant on its parameters. In more general terms, the parameter types of the target method can be more generalized than the parameter types of the delegate and the return type of the target method can be more specialized than the return type of the delegate.

- We’ve also allowed for more delegate binding scenarios, including: open-instance (a delegate to an instance method that is bound to a particular instance only at invocation time) and closed-static delegates (a delegate to a static method where the first parameter to be passed to the target method is specified once at delegate creation time).

Reflection token/handle Resolution API’s

- Reflection has exposed runtime identity abstractions and new API’s that interact with these abstractions specifically for compiler writers and those that want more control over Reflection. We’ve exposed managed constructs for metadata tokens and runtime handles. Exposing metadata tokens gives a compiler writer unique metadata level identity, allowing them to do things like build real symbol table abstractions. Runtime handles give languages unique runtime level identity (among other things), perfect for late binding scenarios commonly found in dynamic languages

Lightweight Code Generation

- New support for runtime method level code generation. These LCG methods are first class citizens of the runtime and for construction of these methods we utilize the familiar Reflection.Emit MethodBuilder API’s. LCG methods are also GC reclaimable. This allows a language author to cheaply create functions that get used and then thrown away. In V1.x, the closest thing we had was Method Rental which came with the overhead of metadata declaration. You can find some examples on Jole's blog.

Generics

- Generics increase the richness of runtime type information that is attached to objects: you can query an object to detect if it is a List<string> or List<Button>. Some dynamic languages will ignore this information, but some may wish to use it in order to do different things when a method is invoked on different kinds of objects. Generics also mean that dynamic languages will occasionally have to specify some extra information or insert some extra “impedance matching” conversions to ensure objects get the right runtime type information attached.

But the cool thing about generics is they will help other languages use code written in dynamic languages. This is really important, and may take a while to appreciate, since people who advocate using dynamic languages aren’t really used to people wanting to call into their code. But in the long run I expect lots of great components to be written in IronPython and other languages, and once that happens it’s interesting to think how the C# and VB programmers are going to use those components. 

For example, let’s consider a hypothetical dynamic language PyLisp, and say the C# programmer comes across a really cool module written in this language. Let’s assume PyLisp represents all values using a runtime type “DynamicObj”. What will a C# programmer do to convert a List<string> to a “DynamicObj”? How about a List<int>? Of course, he or she will write a generic function

          DynamicObj ConvertToPyLisp<T>(List<T> myData)

Highly efficient versions of this function may be possible according to what “T” is, and the implementation of .NET generics will generate especially efficient versions of functions when instantiated at value types. Most importantly they let you efficiently detect what “T” is at runtime, and this increases your range of options for writing this kind of “static-to-dynamic transformation” code.

So, .NET generics will make it easier and more efficient for other .NET programmers to use functionality defined in dynamic languages. In the long run this can only be a win for the dynamic languages themselves. 

Summary

In closing, thank you for reading this far <g>. I'm hoping you'll play with IronPython and get some ideas on how you too can make your language work on the CLR. I'd love to see the new applications you can write using libraries like Avalon and Indigo. I'm looking forward to hearing your feedback to make the system better!