More fun with call vs callvirt


A while back I posted a little bit about the one implication of call vs. callvirt and the other day I ran into another one that I thought I would share.  This one is part of class of issues where C# or VB developers assume that if their language can’t do it, it is not doable.  It turns out that is not always true and you can get yourself in trouble for thinking that way.

 

Let’s say I have a little collection that I want to ensure only stores even numbers in it.  As Whidbey is not quite out yet, I can’t use List<int>, so I subclass for ArrayList and customize the add method to ensure that only even numbers get in. 

 

      public class EvenCollection : ArrayList

      {

            public EvenCollection()

            {

            }

            public override int Add(object value)

            {

                  if (value is Int32 && IsEven ((int) value))

                        return base.Add (value);

                  else throw new Exception (“Value must be a prime int”);

            }

            internal bool IsEven (int value)

            {

                  return value % 2 == 0;

            }

      }

 

I repeat for all the other methods on ArrayList which allow new elements to be added to the list. 

 

Then I do a couple of tests in C# to be sure everything is working as expected, and indeed it does.

      EvenCollection ec = new EvenCollection();

      ec.Add (2); // works

      ec.Add (3); //throws as expected

 

Just to be sure I really do understand OOP 101, I make sure that calls through the base class (ArrayList) still execute my Add() method, and indeed they do:

      ArrayList l = new EvenCollection();

      l.Add (2); // works

      l.Add (3); //throws as expected

 

 

But here is the trick, try this is C++.  Calling the method the normal way works as expected, but C++ supports the ability to call any implementation of a virtual method, not just the most derived. 

      EvenCollection* ec = new EvenCollection();

      ec->Add(__box(2)); //works as expected

      ec->ArrayList::Add(__box(3)); //Utz!  Does not throw an exception

 

The syntax “ArrayList::Add” tells the compiler to call the implementation of Add() on the ArrayList method rather than the most derived one.  This is not supported in C#\VB.NET but IS supported in the CLR. 

 

This is the basic difference between the call and the callvirt instructions that was the subject of a recent contest

. Let’s look at the IL:

IL_001b:  callvirt   instance int32 [OverrideDemoLibrary]OverrideSecIssue.EvenCollection::Add(object)

IL_002a:  call       instance int32 [mscorlib]System.Collections.ArrayList::Add(object)

The call at 001b is the same call the C# and VB make… A callvirt.  The call on 002a is one that only C++ supports (that I know of), it is a call to a virtual member.  This is legal and verifiable IL. 

Why is this important? 

You should be aware of this issue anytime you are using virtual members to guarantee semantics for correctness or security issues.  Imagine if this was a list of only authenticated users or strings that have been validated.  The lesson here to not inherit from ArrayList… It is meant to be delegated to rather than inherited from.  

 

Note: this is a repost

Comments (10)

  1. Ovidiu says:

    <comment>

    <offTopic>

    Shouldn’t it be ‘throw new Exception("Value must be an even int");’ ?

    </offTopic>

    <onTopic>

    Interesting enough, after playing a bit with IlDasm I see that all calls originating from an instance method, targeting another non-virtual instance method of the same object are translated to ‘call’ (obviously, since you’re executing the current method it’s clear that ‘this’ is non-null), while any other call goes to ‘callvirt’.

    How does this affect performance? Since the method I’m calling with ‘callvirt’ is non-virtual I assume the JIT-ter does some magic behind the scenes and doesn’t do much extra work in the end. Oh, boy, here we go with another debugging session… 🙂

    </onTopic>

    </comment>

  2. Delegate to ArrayList… sure, except that deriving from ArrayList and adding a ‘new’ indexer is the easiest way to create a ‘strongly typed’ collection. It may not be correct, or verifyable, but it certainly is short and sweet.

    Some days I ask myself "what were they thinking, releasing a language without generics".

  3. Brad,

    >> The lesson here to not inherit from ArrayList… It is meant to be delegated to rather than inherited from.

    now, wouldn’t it be great if this information was available in the documentation to arraylist?

    WM_CHEERS

    thomas woelfer

  4. RJ says:

    Actually, I don’t think language features should ever be confused with security features. Trusted code should be protected from Untrusted code by a pass through the kernel.

    Using something like "ec->ArrayList::Add(" is a very unusual thing for a c++ developer to do, I don’t think it could be done by mistake. If a developer does this, any resulting failures in correctness are well deserved.

  5. Andy says:

    While I realize that this isn’t the point of this post, would this work as expected in all .NET languages if the developer subclasses CollectionBase instead of ArrayList? Sure, all of the methods of IList would have to be implemented instead of just those which add members to the array/list, but this way you couldn’t call base functionality to get around inserting even numbers because CollectionBase does not contain any methods which add elements to the array.

  6. The virtual members of CollectionBase are all protected, so the only way to circumvent them (short of using reflection in a fully-trusted context), would be to inherit from your collection and override your virtual methods. This can be easily solved by either 1) sealing your class 2) sealing those protected virtual methods, 3) not exposing a public or protected constructor.

    I always assumed that the ArrayList methods were virtual so that the Synchronized, ReadOnly, and IList wrapper versions could be implemented. I never agreed with this decision, and in a few projects I have used my own Vector class in order to avoid the virtual dispatch of ArrayList.

  7. K says:

    I am not sure if I understand this correctly. My experiment with your code gave the following results.

    return base.Add (value);

    IL_0021: call instance int32 [mscorlib]System.Collections.ArrayList::Add(object)

    return Add (value);

    IL_0021: callvirt instance int32 [mscorlib]System.Collections.ArrayList::Add(object)

    C# does generate both call and callvirt instruction in this case. My understanding of your blog was that only C++ should be able to generate the call instruction here but the first case contradicts that?

    Thanks,

    K

  8. Calling a base class’s version of method from an overriden version of that method using the "base" qualifier in C# always (at least I think it always) results in a call rather than a callvirt. Otherwise, it’d be impossible to call the base implementation, as it would cause an infinite recursion. Since you can only call a base method from an inheriting class, this doesn’t violate encapsulation.

    If an external class calls a virtual method without callvirt, it could lead to problems, which is why most languages don’t let you do it.

    I should mention that VB has the MyClass qualifier, which lets a class use a call instead of a callvirt on one of its own methods. But if a class does that to itself it is its own fault if something breaks.

    Oh, and I believe that a call to a method using "base" from any method *other* than the overriding method is exactly the same as using "this". For example, if a "public override void A()" calls "base.B()", it will still be a callvirt.

  9. good says:

    Oh, and I believe that a call to a method using "base" from any method *other* than the overriding method is exactly the same as using "this".