Speed of direct calls vs interfaces vs delegates vs virtuals

I've gotten a couple of follow-up questions on my column on dynamic dispatch asking why there are differences between direct calls, interface calls, virtual calls,  and delegate calls.

I'm not Jan or Rico, who know a lot more about these topics than I do (hint hint - ask them through their blog pages if I don't answer your question) but I can give you the big picture.

Consider the following code:

 interface IProcessor
{
    void Process();
}
class Processor: IProcessor
{
    public void Process()
    {
    }
}

If I write something like:

Processor p = new Processor();
p.Process();

The compiler will emit code that tightly binds to Processor.Process(). In other words, the only function that could be called is that function.

That means that the JIT can easily inline the function call, eliminating the function call overhead totally. A discussion of when the JIT will and won't inline is beyond the scope of this post, but suffice it to say that functions below a given size will be inlined, subject to some constraints.

A brief aside: Even though C# is doing a direct call, you'll find that it's using the callvirt (ie virtual call) to do it. It does this because callvirt has a built-in null check, which means you get an exception on the invocation, rather than on the dereference inside the function.

Anyway, the direct call can easily be inlined, which is very good from a speed perspective.

But what if I have code like this:

class D
{
    public void Dispatch(IProcessor processor)
    {
        processor.Process();
    }
}

Processor p = new Processor();
D d = new D();
d.Dispatch(p);

In the calling code, we know that the function could only be Processor.Process(), but in the Dispatch() function, all the compiler knows is that it has an IProcessor reference, which could be pointing to any instance of an type that implementes IProcessor.

There is therefore no way for the JIT to inline this call - the fact that there is a level of indirection in interfaces prevents it. That's the source of the slowdown.

Virtual functions present a similar case. Like interfaces, there's a level of indirection, and the compiler can't know what type it will really get (ok, perhaps it can, but I'll cover that later.

Delegates also have a level of indirection. In the first release of .NET, our implementation of delegates wasn't as optimal as it was, and had additional overhead above the non-inlineable overhead. In Whidbey, that overhead is gone, and my tests (don't trust my tests) show that it's about as fast as interfaces, which is pretty much what one would expect.

My guess is that it was schedule pressures in V1 that kept us from providing the optimized version, but it's also possible that we didn't think deeply enough about the problem initially.

So, back to virtual functions.

You'd like to be able to inline virtuals, but it's a difficult problem. You could conceivably do a whole-program static analysis and know that a given call didn't have to be virtual, and therefore be able to inline it.

That is, assuming you knew that the set of types was static, which isn't the case in environments where you can dynamically load code at runtime.

A JIT by a celestially-named company has an interesting technique to get around the problem there being indirection in virtuals. It inlines virtual functions that don't require virtual dispatch, and then tracks whether it needs to change that decision later on (using the aptly named “Dynamic deoptimization“).

Inlining virtuals is more important in their environment because all the functions are virtual by default, which means you have a ton of virtual functions that don't need to be. That's less of an issue in .NET because virtual happens less often.

I think that about covers it, and I got through the whole post without mentioning Java once (Oh, what a giveaway!)

[Update: Shane comments that you should be able to inline the interface call because you know what type it is called on. This would be somewhat difficult to do. You could have a base pointer to a derived class (for example), which would mean you didn't know the real type, or you could have dynamic code. Even if this wasn't the case, the JIT would have to do several levels of tracing analysis]

 

 

 

In a case like this, that would be possible, but in a general case, the JIT would have to trace from the point where it knew the type down multiple levels to be able to do the analysis, and there are certainly cases where the JIT couldn't know the true type]