Static Compilation of IronPython scripts

The ability to compile IronPython scripts into .NET IL and to save them to disk existed in IronPython 1.0 but has been missing in 2.0 so far. With IronPython 2.0 Beta4 this has been added back.

Why would I compile dynamic language scripts?

There are a lot of reasons to compile scripts into a binary form. Shri talks about some of them here. For folks who don't want to distribute source code in plain text this provides one level of obfuscation. To that end, a function called CompileModules has been added to the clr module to compile scripts into executable IL. The signature of the function is:

 CompileModules(str assemblyName, dict kwArgs, Array[str] filenames)

So to compile a file foo.py into foo.dll you would do this:

 import clr
clr.CompileModules("foo.dll", "foo.py")

This can now be brought in using the regular clr.AddReference. When clr.AddReference sees a compiled assembly, it publishes the module as well. So one can simply import the module into the code.

 clr.AddReference("foo.dll")
import foo

Multiple files and main

The function can take multiple python files and compile them into one dll. What if you want it to be a standalone executable? There are two things to be done. First, a stub exe is needed that can load the dll. Second, a way to distinguish the main module is needed. The keyword args that CompileModules can take comes in handy here

 import clr
clr.CompileModules("foo.dll", "foo.py", "bar.py", mainModule="main.py")

Now a stub exe can be written that loads up this compiled dll. The IronPython sample pyc.py has code that does shows how to generate a stub exe.

Wait, what is -X:SaveAssemblies mode then?

When IronPython is started with -X:SaveAssemblies, it generates a dll containing IL corresponding to the code it executed. Sounds an awful lot like compilation doesn't it? The difference is one is executable IL and the other isn't.

To understand the difference, one needs to understand that IronPython under normal course of its execution generates IL anyway. Every statement is converted to the DLR AST and IL gets spit out for the ASTs which is then executed. The SaveAssemblies mode simply dumps the generated IL into a dll. It is meant as a debugging device. So what is missing from this IL that prevents it from being re-executable code? The short answer is Dynamic Sites. The sites that are generated during the execution are not persisted. The compilation feature does exactly this - it persists the dynamic sites as well. Lets look at an example here and compare the generated IL in reflector. (Only the relevant code is copied over from reflector). This python code:

 print 2 + 5
print 2 * 5
print 3 / 5

when run with -X:SaveAssemblies mode produces this code:

 public static CallSite<DynamicSiteTarget<int, int, object>> #Constant207;
public static CallSite<DynamicSiteTarget<int, int, object>> #Constant208;
public static CallSite<DynamicSiteTarget<int, int, object>> #Constant209;
 $lineNo = 1;
PythonOps.Print(__global_context, #Constant207.Target(#Constant207, 2, 5));
$lineNo = 2;
PythonOps.Print(__global_context, #Constant208.Target(#Constant208, 2, 5));
$lineNo = 3;
PythonOps.Print(__global_context, #Constant209.Target(#Constant209, 3, 5));

Notice that the Constants defined here are actually defined as fields on the generated type and this type doesn't get instantiated anywhere and therefore the sites don't get assigned anywhere. The same python code when compiled with clr.CompiledModules produces this code:

 object[] objArray = new object[] { 
CallSite<DynamicSiteTarget<int, int, object>>.Create(PythonOps.MakeOperationAction(context, "Add")), 
CallSite<DynamicSiteTarget<int, int, object>>.Create(PythonOps.MakeOperationAction(context, "Multiply")), 
CallSite<DynamicSiteTarget<int, int, object>>.Create(PythonOps.MakeOperationAction(context, "Divide")) 
};
 line = 1;
PythonOps.Print(context, ((CallSite<DynamicSiteTarget<int, int, object>>)objArray[0]).Target(
    (CallSite<DynamicSiteTarget<int, int, object>>)objArray[0], 2, 5));
line = 2;
PythonOps.Print(context, ((CallSite<DynamicSiteTarget<int, int, object>>)objArray[1]).Target(
    (CallSite<DynamicSiteTarget<int, int, object>>)objArray[1], 2, 5));
line = 3;
PythonOps.Print(context, ((CallSite<DynamicSiteTarget<int, int, object>>)objArray[2]).Target(
    (CallSite<DynamicSiteTarget<int, int, object>>)objArray[2], 3, 5));

You can see that all the dynamic call sites are being created here and their targets are being invoked. This then is perfectly executable code - maybe not as succinct as the python code but it does the same thing :)