An IronPython Profiler

I was recently working on improving the performance of a Python-based application and thought it would be useful to profile my code.  My intuition was telling me that the problem was in a certain area of the code, but I wanted hard numbers to back up this intuition before I went to the trouble of doing a rewrite.  Unfortunately, neither the CLR Profiler nor the profiling support in Visual Studio are very effective when used with IronPython, as they weren’t really created for environments where code is dynamically compiled at runtime.

But Tomáš had written a profiler for IronRuby nearly half a year ago, so I thought I’d port this to IronPython, where it’s now experimentally available in IronPython 2.6 Alpha 1.

The way Tomáš’ profiler works is pretty straightforward.  At the start of every method we compile, a snapshot is taken of the current time.  Another is taken at the end of the method and the difference is stored for later retrieval.  The only tricky thing about it is trying to update the statistics in a manner that’s both thread-safe and has minimal effect on performance.

I’ve modified it a bit from the original, though, and added some fun features.  In particular, it now tracks the number of calls made to each method and also keeps separate counters for the amount of time spent exclusively inside the method versus the duration spent inclusively in both this method and in any methods it calls.

Additionally, I’ve added counters to track calls made from Python code into statically-compiled .NET code.

 

Here’s a sample program profile.py that demonstrates the statistics we’re gathering by printing them on exit.

 import atexit
import clr
from System.Threading import Thread

def delay(seconds):
    Thread.CurrentThread.Join(1000 * seconds)

def print_profile():
    for p in clr.GetProfilerData():
        print '%s\t%d\t%d\t%d' % (p.Name, p.InclusiveTime, p.ExclusiveTime, p.Calls)

atexit.register(print_profile)
delay(1)

 

The profiler isn’t enabled by default, as it does have a small impact on performance.  You need to opt into it by running IronPython with the flag “-X:EnableProfiler”.  If you use this flag with the above code, you’ll get the following output:

Name Inclusive Exclusive Calls
module site 61440 61440 1
module profile: def delay(seconds) 10721609 44 1
module profile: def print_profile() 20 20 1
module profile 23827771 169190 1
type Builtin: method: __import__(CodeContext, String, Object, Object, Object, Int32) 9372193 9372193 4
module atexit: def _run_exitfuncs() 268844 69133 1
module atexit: def register(func, targs, kargs) 271181 4728 1
module atexit 6635215 113043 1
type Builtin: method: hasattr(CodeContext, Object, String) 4336493 4336493 1
type List: method: append(Object) 14677 14677 1
type Thread: method: get_CurrentThread() 83 83 1
type Thread: method: Join(Int32) 10077140 10077140 1
type List: method: pop() 17810 17810 1
type ClrModule: method: GetProfilerData(CodeContext) 0 0 1

These records are returned in the order that they were compiled, not the order that they were first called – except that methods which were never called are omitted entirely.

A record with the name “module site” means the top-level import of the site module.

Other records that start with the name “module” refer to Python code.

Records that start with “type” refer to .NET code that is called directly from IronPython.

Times are recorded in “Ticks”, which are increments of 100ns.  That means that the 10077140 ticks spent in the Thread.Join method are actually 1.007714 second.

Here’s what this profiler doesn’t do a good job of recording: the time required to parse Python, compile it into expression trees, generate the IL and JIT the IL.  Some of this is accounted for in the time it takes to import an external module, but not in a way that provides visibility to the constituent parts.

 

We’d love to get your feedback on this experimental addition to IronPython.  Happy profiling!