I was recently working on improving the performance of a Python-based application and thought it would be useful to profile my code. My intuition was telling me that the problem was in a certain area of the code, but I wanted hard numbers to back up this intuition before I went to the trouble of doing a rewrite. Unfortunately, neither the CLR Profiler nor the profiling support in Visual Studio are very effective when used with IronPython, as they weren’t really created for environments where code is dynamically compiled at runtime.
The way Tomáš’ profiler works is pretty straightforward. At the start of every method we compile, a snapshot is taken of the current time. Another is taken at the end of the method and the difference is stored for later retrieval. The only tricky thing about it is trying to update the statistics in a manner that’s both thread-safe and has minimal effect on performance.
I’ve modified it a bit from the original, though, and added some fun features. In particular, it now tracks the number of calls made to each method and also keeps separate counters for the amount of time spent exclusively inside the method versus the duration spent inclusively in both this method and in any methods it calls.
Additionally, I’ve added counters to track calls made from Python code into statically-compiled .NET code.
Here’s a sample program profile.py that demonstrates the statistics we’re gathering by printing them on exit.
import atexit import clr from System.Threading import Thread def delay(seconds): Thread.CurrentThread.Join(1000 * seconds) def print_profile(): for p in clr.GetProfilerData(): print '%s\t%d\t%d\t%d' % (p.Name, p.InclusiveTime, p.ExclusiveTime, p.Calls) atexit.register(print_profile) delay(1)
The profiler isn’t enabled by default, as it does have a small impact on performance. You need to opt into it by running IronPython with the flag “-X:EnableProfiler”. If you use this flag with the above code, you’ll get the following output:
|module profile: def delay(seconds)||10721609||44||1|
|module profile: def print_profile()||20||20||1|
|type Builtin: method: __import__(CodeContext, String, Object, Object, Object, Int32)||9372193||9372193||4|
|module atexit: def _run_exitfuncs()||268844||69133||1|
|module atexit: def register(func, targs, kargs)||271181||4728||1|
|type Builtin: method: hasattr(CodeContext, Object, String)||4336493||4336493||1|
|type List: method: append(Object)||14677||14677||1|
|type Thread: method: get_CurrentThread()||83||83||1|
|type Thread: method: Join(Int32)||10077140||10077140||1|
|type List: method: pop()||17810||17810||1|
|type ClrModule: method: GetProfilerData(CodeContext)||0||0||1|
These records are returned in the order that they were compiled, not the order that they were first called – except that methods which were never called are omitted entirely.
A record with the name “module site” means the top-level import of the site module.
Other records that start with the name “module” refer to Python code.
Records that start with “type” refer to .NET code that is called directly from IronPython.
Times are recorded in “Ticks”, which are increments of 100ns. That means that the 10077140 ticks spent in the Thread.Join method are actually 1.007714 second.
Here’s what this profiler doesn’t do a good job of recording: the time required to parse Python, compile it into expression trees, generate the IL and JIT the IL. Some of this is accounted for in the time it takes to import an external module, but not in a way that provides visibility to the constituent parts.
We’d love to get your feedback on this experimental addition to IronPython. Happy profiling!