Putting a base in the middle


Here’s a crazy-seeming but honest-to-goodness real customer scenario that got reported to me recently. There are three DLLs involved, Alpha.DLL, Bravo.DLL and Charlie.DLL. The classes in each are:

public class Alpha // In Alpha.DLL
{
  public virtual void M()
  {
    Console.WriteLine(“Alpha”);
  }
}

public class Bravo: Alpha // In Bravo.DLL
{
}

public class Charlie : Bravo // In Charlie.DLL
{
  public override void M()
  {
    Console.WriteLine(“Charlie”);
    base.M();
  }
}

Perfectly sensible. You call M on an instance of Charlie and it says “Charlie / Alpha”.

Now the vendor who supplies Bravo.DLL ships a new version which has this code:

public class Bravo: Alpha
{
  public override void M()
  {
    Console.WriteLine(“Bravo”);
    base.M();
  }
}

The question is: what happens if you call Charlie.M without recompiling Charlie.DLL, but you are loading the new version of Bravo.DLL?

The customer was quite surprised that the output is still “Charlie / Alpha”, not “Charlie / Bravo / Alpha”.

This is a new twist on the brittle base class failure; at least, it’s new to me.

Customer: What’s going on here?

When the compiler generates code for the base call, it looks at all the metadata and sees that the nearest valid method that the base call can be referring to is Alpha.Foo. So we generate code that says “make a non-virtual call to Alpha.Foo”. That code is baked into Charlie.DLL and it has the same semantics no matter what Bravo.DLL says. It calls Alpha.Foo.

Customer: You know, if you generated code that said “make a non-virtual call to Bravo.Foo”, the CLR will fall back to calling Alpha.Foo if there is no implementation of Bravo.Foo.

No, I didn’t know that actually. I’m slightly surprised that this doesn’t produce a verification error, but, whatever. Seems like a plausible behaviour, albeit perhaps somewhat risky. A quick look at the documented semantics of the call instruction indicates that this is the by-design behaviour, so it would be legal to do so.

Customer: Why doesn’t the compiler generate the call as a call to Bravo.Foo? Then you get the right semantics in my scenario!

Essentially what is happening here is the compiler is generating code on the basis of today’s static analysis, not on the basis of what the world might look like at runtime in an unknown future. When we generate the code for the base call we assume that there are not going to be changes in the base class hierarchy after compilation. That seemed at the time to be a reasonable assumption, though I can see that in your scenario, arguably it is not.

As it turns out, there are two reasons to do it the current way. The first is philosophical and apparently unconvincing. The second is practical.

Customer: What’s the philosophical justification?

There are two competing “mental models” of what “base.Foo” means.

The mental model that matches what the compiler currently implements is “a base call is a non-virtual call to the nearest method on any base class, based entirely on information known at compile time.”

Note that this matches exactly what we mean by “non-virtual call”. An early-bound call to a non-virtual method is always a call to a particular method identified at compile time. By contrast, a virtual method call is based at least in part on runtime analysis of the type hierarchy. More specifically, a virtual method identifies a “slot” at compile time but not the “contents” of that slot. The “contents” – the actually method to call – is identified at runtime based on what the runtime type of the receiver stuffed into the virtual method slot.

Your mental model is “a base call is a virtual call to the nearest method on any base class, based on both information known at runtime about the actual class hierarchy of the receiver, and information known at compile time about the compile-time type of the receiver.”

In your model the call is not actually virtual, because it is not based upon the contents of a virtual slot of the receiver. But neither is it entirely based on the compile-time knowledge of the type of the receiver! It’s based on a combination of the two. Basically, it’s what would have been the non-virtual call in the counterfactual world where the compiler had been given correct information about what the types actually would look like at runtime.

A developer who has the former mental model (like, say, me) would be deeply surprised by your proposed behavior. If the developer has classes Giraffe, Mammal and Animal, Giraffe overrides virtual method Animal.Feed, and the developer says base.Feed in Giraffe, then the developer is thinking either like me:

I specifically wish Animal.Feed to be called here; if at runtime it turns out that evil hackers have inserted a method Mammal.Feed that I did not know about at compile time, I still want Animal.Feed to be called. I have compiled against Animal.Feed, I have tested against that scenario, and that call is precisely what I expect to happen. A base call gives me 100% of the safe, predictable, understandable, non-dynamic, testable behavior of any other non-virtual call. I rely upon those invariants to keep my customer’s data secure.

Basically, this position is “I trust only what I can see when I wrote the code; any other code might not do what I want safely or correctly”.

Or like you:

I need the base class to do some work for me. I want something on some base class to be called. Animal.Feed or Mammal.Feed, I don’t care, just pick the best one – whichever one happens to be “most derived” in some future version of the world – by doing that analysis at runtime. In exchange for the flexibility of being able to hot-swap in new behavior by changing the implementation of my base classes without recompiling my derived classes, I am willing to give up safety, predictability, and the knowledge that what runs on my customer’s machines is what I tested.

Basically, this position is “I trust that the current version of my class knows how to interpret my request and will do so safely and correctly, even if I’ve never once tested that.”

Though I understand your point of view, I’m personally inclined to do things the safe, boring and sane way rather than the flexible, dangerous and interesting way. However, based on the several dozen comments on the first version of this article, and my brief poll of other members of the C# compiler team, I am in a small minority that believes that the first mental model is the more sensible one.

Customer: The philosophical reason is unconvincing; I see a base call as meaning “call the nearest thing in the virtual hierarchy”. What’s the practical concern?

In the autumn of 2000, during the development of C# 1.0, the behaviour of the compiler was as you expect: we would generate a call to Bravo.M and allow the runtime to resolve that as either a call to Bravo.M if there is one or to Alpha.M if there is not. My predecessor Peter Hallam then discovered the following case. Suppose the new hot-swapped Bravo.DLL is now:

public class Bravo: Alpha
{
  new private void M()
  {
    Console.WriteLine(“Bravo”);
  }
}

Now what happens? Bravo has added a private method, and one of our design principles is that private methods are invisible implementation details; they do not have any effect on the surrounding code that cannot see them. If you hot-swap in this code and the call in Charlie is realized as a call to Bravo.M then this crashes the runtime. The base call resolves as a call to a private method from outside the method, which is not legal. Non-virtual calls do matching by signature, not by virtual slot.

The CLR architects and the C# architects considered many possible solutions to this problem, including adding a new instruction that would match by slot, changing the semantics of the call instruction, changing the meaning of “private”, implementing name mangling in the compiler, and so on. The decision they arrived at was that all of the above were insanely dangerous considering how late in the ship cycle it was, how unlikely the scenario is, and the fact that this would be enabling a scenario which is directly contrary to good sense; if you change a base class then you should recompile your derived classes. We don’t want to be in the business of making it easier to do something dangerous and wrong.

So they punted on the issue. The C# 1.0 compiler apparently did it the way you like, and generated code that sometimes crashed the runtime if you introduced a new private method: the original compilation of Charlie calls Bravo.M, even if there is no such method. If later there turns out to be an inaccessible one, it crashes. If you recompile Charlie.DLL, then the compiler notices that there is an intervening private method which will crash the runtime, and generates a call to Alpha.M.

This is far from ideal. The compiler is designed so that for performance reasons it does not load the potentially hundreds of millions of bytes of metadata about private members from referenced assemblies; now we have to load at least some of that. Also, this makes it difficult to use tools such as ASMMETA which produce “fake” versions of assemblies which are then later replaced with real assemblies. And of course there is always still the crashing scenario to worry about.

The situation continued thusly until 2003, at which point again the C# team brought this up with the CLR team to see if we could get a new instruction defined, a “basecall” instruction which would provide an exact virtual slot reference, rather than doing a by-signature match as the non-virtual call instruction does now. After much debate it was again determined that this obscure and dangerous scenario did not meet the bar for making an extremely expensive and potentially breaking change to the CLR.

Concerned over all the ways that this behaviour was currently causing breaks and poor performance, in 2003 the C# design team decided to go with the present approach of binding directly to the slot as known at compile time. The team all agreed that the desirable behaviour was to always dynamically bind to the closest base class — a point which I personally disagree with, but I see their point. But given the costs of doing so safely, and the fact that hot-swapping in new code in the middle of a class hierarchy is not exactly a desirable scenario to support, it’s better to sometimes force a recompilation (that you should have done anyways) than to sometimes crash and die horribly.

Customer: Wow. So, this will never change, right?

Wow indeed. I learned an awful lot today. One of these days I need to sit down and just read all five hundred pages of the C# 1.0 and 2.0 design notes.

I wouldn’t expect this to ever change. If you change a base class, recompile your derived classes. That’s the safe thing to do. Do not rely on the runtime fixing stuff up for you when you hot-swap in a new class in the middle of a class hierarchy.

UPDATE: Based on the number of rather histrionic comments I’ve gotten over the last 24 hours, I think my advice above has been taken rather out of the surrounding context. I’m not saying that every time someone ships a service pack that has a few bug fixes that you are required to recompile all your applications and ship them again. I thought it was clear from the context that what I was saying was that if you depend upon base type which has been updated then:

(1) at the very least test your derived types with the new base type — your derived types are relying on the mechanisms of the base types; when a mechanism changes, you have to re-test the code which relies upon that mechanism.

(2) if there was a breaking change, recompile, re-test and re-ship the derived type. And

(3) you might be surprised by what is a breaking change; adding a new override can potentially be a breaking change in some rare cases.

I agree that it is unfortunate that adding a new override is in some rare cases a semantically breaking change. I hope you agree that it is also unfortunate that adding a new private method was in some rare cases a crash-the-runtime change in C# 1.0. Which of those evils is the lesser is of course a matter of debate; we had that debate between 2000 and 2003 and I don’t think its wise or productive to second-guess the outcome of that debate now.

The simple fact of the matter is that the brittle base class problem is an inherant problem with the object-oriented-programming pattern. We have worked very hard to design a language which minimizes the likelihood of the brittle base class problem biting people. And the base class library team works very hard to ensure that service pack upgrades introduce as few breaking changes as possible while meeting our other servicing goals, like fixing existing problems. But our hard work only goes so far, and there are more base classes in the world that those in the BCL.

If you find that you are getting bitten by the brittle base class problem a lot, then maybe object oriented programming is not actually the right solution for your problem space; there are other approaches to code reuse and organization which are perfectly valid that do not suffer from the brittle base class problem.

Comments (134)

  1. Jan-Olof Stenmark says:

    (Before reading this article, I actually thought the base call was a "magic virtual base call" that called the nearest base class function.)

    But…, the whole point of the "runtime-assembly-binding bindingRedirect" in configuration files is to be able to replace assemblies with new versions without recompiling the code.

    It’s extremly important that one doesn’t place code in new overrides. When Microsoft releases service packs for the framework, do they (you) never create new overrides in non-sealed public classes? I thought this was happening frequently especially in UI-control-libraries where you are supposed to inherit from base classes. If they do, everyone has to re-compile all the applications for them to work properly.

  2. Stuart says:

    I seem to remember reading something about the Java developers fixing this bug (yes, I consider it a bug) in JDK1.2 or so.

    I’m with the customer: I find it pretty horrific that this wasn’t fixed. I understand the logic of being conservative about changing things, but this strikes me as dangerous.

    People who have the mental model that the customer has, are likely to use (and in fact I frequently DO use) the ability to override something as a sort of ‘security wrapper’ around the base class methods. I put security in quotes intentionally: I know it’s bypassable if you have full trust or the ability to use reflection or whatever. It’s a guard against doing something *accidentally*, to enforce invariants. For example, consider this in a world pre-generics:

    public class StringList : ArrayList {

     override Add(object o) {

       if (!(o is String)) throw new ArgumentException("parameter must be a string");

       base.Add(o);

     }

     // and override other methods as well similarly

    }

    Suppose that somebody else inherited from a prior version of StringList in which the StringList developer forgot to override Add. Now they can add things into the StringList that aren’t strings. Your approach means that the developer of StringList can’t assume that his overrides will actually be called. Is that really the "conservative" approach?

  3. William Schubert says:

    Couldn’t there be some sort of attribute that tells the compiler that we want bravo to be inserted between alpha and charlie?  Then we could expose this technique in situations where we expect our Type II development to insert method calls, while preventing exposure of other calls?  

  4. Stuart says:

    In fact… I seem to remember that the Java developers considered the bug so problematic that not only did they change the compiler in the next version, but they also *changed the specification of how the runtime handled non-virtual calls to virtual methods* so that code compiled with the previous, buggy versions of the compiler would still run correctly. I can’t remember how they changed it – I think it struck me as hacky, and off the top of my head, I can’t see an algorithm that’d have the desired effect. But I’m fairly sure that’s what they did.

    The C# language syntax for the feature points pretty clearly to the customer’s interpretation, too. It’s "base.whatever()". The base class of Charlie is Bravo, not Alpha. So a plain reading of base.whatever() is "call Bravo’s implementation of this" – which may, in the end, delegate to Alpha’s.

    Note to self: add do-nothing overrides for every virtual method every time I inherit from a class, from now on…

  5. Chris B says:

    The customer-proposed behavior seems to fit better with dynamic languages to me.  A lot of C# is about pushing as much verification as possible into the compiler, such as type analysis and deciding what method to call.  As more work gets moved to runtime, you become more dependent on automated tests for correctness.  In C#, you will almost never end up with a production bug because you didn’t specify the right number of arguments or called a non-existent method, especially if you recompile against new versions of binaries.  In a dynamic language, these things can happen easily without sufficient testing.  So, while the current behavior may seem non-intuitive at first, I think it is the correct behavior for a static language like C#.  I may start to feel differently depending on the direction which the new dynamic features in the language take.

  6. Someguy says:

    It seems like ‘you’ (the fake one having the fake conversation with the fake customer) actually want to be able to write a C# equivalent to ‘this->Alpha::M()’.  When you say "A base call gives me 100% of the safe, predictable, understandable, non-dynamic, testable behavior of any other non-virtual call. I rely upon those invariants to keep my customer’s data secure" it’s true only so far as you are actually sure what ‘base.Bleck()’ is calling.  The only way to *be* sure in the current model is to read the source of the class that you are inheriting from and actually check whether the method is overridden or not.  That seems a bit sketchy to me, but I can see where you are coming from: we generally expect things to get modified at the leaves, and test or design to that effect.

    So, time for new syntax and breaking changes, hurray!

  7. SomeGuy says:

    It seems like ‘you’ (the fake one having the fake conversation with the fake customer) actually want to be able to write a C# equivalent to ‘this->Alpha::M()’.  When you say "A base call gives me 100% of the safe, predictable, understandable, non-dynamic, testable behavior of any other non-virtual call. I rely upon those invariants to keep my customer’s data secure" it’s true only so far as you are actually sure what ‘base.Bleck()’ is calling.  The only way to *be* sure in the current model is to read the source of the class that you are inheriting from and actually check whether the method is overridden or not.  That seems a bit sketchy to me, but I can see where you are coming from: we generally expect things to get modified at the leaves, and test or design to that effect.

    So, time for new syntax and breaking changes, hurray!

  8. Stuart says:

    More to the point, if this your mental model:

    "I specifically wish Animal.Feed to be called here; if at runtime it turns out that evil hackers have inserted a method Mammal.Feed that I did not know about at compile time, I still want Animal.Feed to be called. I have compiled against Animal.Feed, I have tested against that scenario, and that call is precisely what I expect to happen. A base call gives me 100% of the safe, predictable, understandable, non-dynamic, testable behavior of any other non-virtual call. I rely upon those invariants to keep my customer’s data secure."

    This is an example of the "it rather involved being on the other side of this airtight hatchway" attitude: You’re running code linking to a different version of Bravo.dll!

    Unless the ONLY thing you do with Bravo.dll is make base class methods that are expected to skip it entirely and call through to Alpha, you’re ALREADY in an untested scenario and you can’t presume any of your invariants to hold.

    Use some kind of strong naming and assembly binding to guarantee that your code won’t run if the version of Bravo.dll isn’t exactly the one you tested against.

    The current behavior means that the author of Bravo.dll is getting HIS invariants silently bypassed without any ability to do anything about it. If you have to write code that presumes that any time you override a method your override might be silently skipped, that makes it really hard to enforce invariants!

  9. Tobias Berg says:

    By coincidence I posed a similar question last week, both at StackOverflow (http://stackoverflow.com/questions/2476754/method-binding-to-base-method-in-external-library-cant-handle-new-virtual-method) and to you Eric directly, so I’ve been waiting on this post to go live…

    I can understand your arguments but I still don’t fully agree with you so I’ll do my best to make you reconsider.

    Firstly, as far as I can see from talking to other developers and from the existence of the customer in your post (ok, maybe this is not statistically significant but still…), most developers not actually working on the C# compiler does seem to believe that the base call is some kind of magic virtual call.

    I don’t really buy the argument that making the base call semi-virtual would be less safe or predictable, if the evil hackers were able to get you to use their version of the dll they can probably steal your data and/or crash your program anyway. This is why there are such things as signed libraries.

    The reason why you never heard this question before is probably because most of the time it doesn’t matter, it mostly only matters when you develop a library containing classes meant to be derived from and where you deploy new versions of the library without recompiling the applications. This is, in my opinion, a not-so-crazy scenario. (In our case we make a framework for building web sites, containing, among other things, controls of various kinds. Web controls have lots of life cycle event methods, OnInit, Onload, OnPreRender etc, and at times changes to our basic controls make it necessary to add an overload to one of these methods at new level in the hierarchy. On the other hand we like to be able to deploy new versions of our product to live sites with a minimum of downtime and tricky manual steps like recompiling the entire site…)

    I understand that changing something like this is probably more complicated than I think but in my opinion this would be a change with very small negative consquences and the result would be that the base call would work the way most people seem to think it does. As it is right now we are tempted to just add slews of "empty" overrides just in case we might want to add some code there some time in the future.

  10. Pavel Minaev says:

    I think that the real surprise here is not in the behavior of "base" per se, so much so as it is with the apparent mismatch between "base" and "override" semantics. If "override M" in C would have overridden A.M and ignored B.M here as well, an argument for consistent rule could be made: always determine those things out at compile-time. But this is not the case, so from a perspective of a developer who wants "100% of the safe, predictable, understandable, non-dynamic, testable behavior" – well, but you don’t have that here already, either way!

    And the reason why a match here is important is that, in practice, "base" is used together with "override" – in an overridden method, to call a base implementation –  9 times out of 10, if not more often than that. So, for many developers, the mental model of what "override" means is _defined in terms of "base"_!

  11. barrkel says:

    I’m pretty sure you’re wrong on this one, Eric, particularly with respect to the principle of least surprise.

    The OO mental model for base.Foo() is that you pass the message along to your superclass. That’s it. It’s out of your hands then. Static analysis doesn’t come into the equation – it’s just an implementation detail.

  12. Gabe says:

    I’m not sure I can agree with you. As a matter of fact, you are the one who convinced me that you’re wrong: http://stackoverflow.com/questions/2323401/how-to-call-base-base-method/2327821#2327821

    It seems that you said that calling "base.base.M()" is illegal in C# because it could break the invariants of the base class. However today you say that I am stuck with "base.base.M()" even when what I meant was "base.M()" because it maintains my invariants.

    What good is maintaining my invariants if it breaks the invariants of my base class? How can I possibly expect to maintain my invariants if I’m doing something known to break the invariants of my base class?

    Well, at least now I know how to call "base.base.M()". I just have to compile against a version of the base class that doesn’t override M, and then run against the real version of the class that does override M.

  13. Ben says:

    I’m with the customer on this one too.

    This is quite contrary to my expectations.

    As others have said, I would expect to be able to switch and change existing DLLs and have related code automatically be updated without re-compilation.

    More fundamentally, though, I would expect the compiler to honor my virtual code, even if it thinks it knows better. Such optimizations belong in the runtime, not baked into the compiled IL.

    I now fear that I will need to review many virtual calls at the IL level and possibly patch the IL…

    I used to manually compile variance code before C# 4 which now supports it. Now I may need to do this on virtual calls… maybe I should just do all of my coding in IL.

    This is terrible. Please fix it.

  14. Pavel Minaev says:

    > Customer: You know, if you generated code that said “make a non-virtual call to Bravo.Foo”, the CLR will fall back to calling Alpha.Foo if there is no implementation of Bravo.Foo.

    > No, I didn’t know that actually. I’m slightly surprised that this doesn’t produce a verification error, but, whatever. Seems like a plausible behaviour, albeit perhaps somewhat risky.

    I discovered that  a while ago, and was rather surprised myself:

    http://stackoverflow.com/questions/1456785/a-definite-guide-to-api-breaking-changes-in-net/1522718#1522718

    But it is, in fact, well-specified by Ecma-335 (PII 3.19 "call"):

    "If the method does not exist in the class specified by the metadata token, the base classes are searched to find the most derived class which defines the method and that method is called.

    [Rationale: This implements “call base class” behavior. end rationale]"

    Interesting; if I understand the Rationale correctly, they have in fact specified it this way precisely so as to provide a way to implement the "base" semantics that the customer is asking for!

  15. Daniel Grunwald says:

    This is not crazy-seeming at all. In fact, I’m pretty sure this will be causing hard-to-find bugs when using existing binaries on .NET 4 – surely some non-sealed classes got new overrides?

    I suppose this is also the reason why this was fixed in Java.

    Yes it’s a breaking change and too late to fix that now (especially for existing .NET 2 binaries); but please fix it for the next compiler version! The meaning of "base" should be the immediate base class, not "whatever base class happened to have that method when the assembly was compiled".

  16. barrkel says:

    Another way to look at it: overriding a method with a body that just forwards the call to the base class, without doing anything else, should be a null operation. If overriding a method changes the way the code links, such that virtual method calls get skipped on their way up the hierarchy, then something is grievously broken.

    You might recall the argument against default parameters in C#: the problem being that they bake the constant value of the default into all the call sites, harming version resilience. What you are arguing for in this post is to make the *hierarchy, and the set of methods that ancestors it overrides* a *constant*. You are arguing for baking in the entire history of a method’s override chain all the way up to the ultimate ancestor at every base.Method() call, as a hard-coded constant.

  17. kvb says:

    Interestingly, the ECMA CLI spec’s description of the "call" opcode (section 3.19 of Partition III) includes this text:

    "If the method does not exist in the class specified by the metadata token, the base classes are searched to find

    the most derived class which defines the method and that method is called.

    [Rationale: This implements“call base class” behavior. end rationale]"

    This appears to me to indicate that the CLI spec writers agree with your customer about the anticipated behavior of base calls…

  18. barrkel says:

    Describing the customer’s expected behaviour as an entirely new kind of pseudo-virtual call is a bit over the top, too. It’s the normal behaviour in almost all OO languages.

    In which OO languages does this behaviour happen without recompilation? Java, apparently. Are there others? — Eric

    You also assert that “no C# programmer in the world has ever designed for” it – but I would assert exactly the opposite, that almost every C# programmer designs for the customer’s expected behaviour.

  19. arnshea says:

    Having worked in Java before C# you develop an expectation that methods are virtual by default.  So this position "feels intuitive" to me mainly because I encountered it first.  The arguments you make about breakage are imho persuasive.  Plus there are other ways to dynamically choose which method to execute (reflection for instance) that make your intention manifest instead of depending on implicit method call lookups.

    I think it boils down to a judgment call: When a developer names a method in an inheritance hierarchy with the same name as a method higher up in the inheritance hierarchy are they doing this by mistake or did they intend it as a replacement?

    The background of the person making the judgment call can come into play here: an academic may typically have access to all of the source code they’re using, a commercial developer typically doesn’t have access to the source code and is using all sorts of libraries entirely beyond their control.

  20. Adam Robinson says:

    I, too, am with the customer on this one. I think barrkel said it best: the idea of calling "base.M()" (in the minds of most developers) is that you’re passing the call to M() up the chain of inheritance, stopping at the first one you find (from a runtime perspective, not a compile-time static analysis perspective). While I haven’t yet encountered this issue, I know that I would have been utterly confounded by the fact that the compiler exhibited this behavior.

    It sounds like the only short-term solution is to insert override stubs for functions that might get overridden in the future.

  21. Pavel Minaev says:

    For what’s it worth, all languages that target .NET seem to match C# behavior – checked with VB and C++/CLI.

  22. bystander says:

    I have to say that like many other commenters, I’m on the customer side on this one.

    My mental model has never been "base.M() == Alpha.M()" but rather "base.M() == "Bravo.M()" wherever that method definition may come from (Alpha actually)., although I now know it’s wrong. But seriously: who would naturally assume that "base" may statically mean "base.base" ?

    When you write base.M() how can you know which method you are statically calling ? Do you think people actually go and check at which level in the hierarchy M is defined ? No, they surely don’t (especially with deep class hierarchies, e.g. WPF).

    When you write base.M() you actually think: "call the default behavior" before, after or during some custom processing you’re putting in place. Stuart’s StringList is a good example of this.

    I also agree this makes the whole framework fragile. You can’t fix bugs in .NET by releasing a service pack if that means adding an override in the leaves of the class hierarchy, as Daniel Grunwald  noted.

    This really looks like something that ought to be fixed, in my opinion. Note that the "fix" only needs to be applied to cross-assembly base calls. Every call to a base method defined in the same assembly can be kept as a non-virtual call (which would probably be preferrable for performance, although the lookup only needs to be done once, as class hierarchies don’t change at runtime.)

  23. Eric Weinschenk says:

    I’m with the customer on this one. Your concerns over ensuring only the code you compiled against is run is moot, as that can be handled numerous other ways. The ability to dynamical change out Bravo may be needed in certain environments and the compiler is clearly failing to do so. M2C.

  24. Arno says:

    I agree with the customer as well. I’m actually quite surprised by this behavior. I thought the call was a dynamically determined call to the nearest matching method. I think this is something that needs to be changed, perhaps the same way Java did it by keeping it backward compatible.

  25. Steve Bjorg says:

    This is just plain broken, sorry.  Definitively a bad call.  Let me explain why:

    Let’s say there wasn’t just M() that was being used, but also P().  However, P() was overridden in Bravo and thus is being invoked indirectly by the override in Charlie.  Now, a new assembly is deployed that has changes in M() and in P().  However, only HALF of these changes are being picked up!!!  Think of M() and P() being called Add() and Remove() instead.  Now imagine you added a counter to how many items are added and removed.  Suddenly, your application shows you that either too many items have been added or removed, when in fact everything is balanced.  Again, because only HALF of the implementation is being used.

    I’m literally shocked that anyone would consider this the proper behavior of a virtual base method!  Please fix it.

  26. Jason says:

    I’m also agreeing with the customer on this one. I find the scenario far more interesting though if we don’t assume that I "own" all of the DLLs or classes in question. Assume class Alpha and class Beta are provided in some third-party library, say some framework like Winforms or ASP.NET where there is a concept of a "control". It’s natural in such type heirarchies to inherit, override certain "OnXXXX" methods where you are also expected to call the base member. (for example Alpha = System.Web.UI.Control, beta = UserControl, and Charlie = my class)

    Now suddenly, if the framework needed to fix a bug which required the addition of this method to the middle class, they can’t, since the only way they could do this was to ask everyone in the world to rebuild their Charlie classes. I’d hate to see what "fun" would be had by all involved if this bug had security ramifications!

    Eric> "I am willing to give up safety, predictability, and the knowledge that what runs on my customer’s machines is what I tested."

    It seems like you’re trying to defend this customer against _their_ customers going "oh, well, I changed your assemblies by copying a few random ones into place, as well as one I built myself. Hey, your app doesn’t work now. What’s gives?" There’s a reason why most pieces of hardware have a ltitle "warranty void if broken" sticker. 🙂 Somebody’s point of "it’s rather involved being on the other side of this airtight hatchway" seems to defeat any argument this is being done to help security.

    Sadly, I might have to admit this is one of those sad "meh, it’s not good where we are but we can’t change it" times. 🙁

  27. cammerman says:

    "A base call gives me 100% of the safe, predictable, understandable, non-dynamic, testable behavior of any other non-virtual call. I rely upon those invariants to keep my customer’s data secure."

    If Bravo.DLL could be replaced, why couldn’t Alpha.DLL be replaced?  If Alpha.DLL is replaced without recompiling Charlie.DLL, you still get a hot-swapped behavior change, and the security argument is undermined.  The only way to truly prevent this from happening is to strong-name your DLLs and references.

  28. Carl D says:

    I’d have to agree with the customer on this one as well – the current behavior is surprising and potentially dangerous.  I find the example given here of a component library faced with the myriad virtual functions of the Control base class to be compelling that the current compiler behavior is wrong.

    That said, I’m 99.99% certain that I’ve never encountered this paritcular oddity.

  29. arnshea says:

    IIRC the methods-are-virtual-by-default style resulted in a good deal of headache for Java as more libraries came online.  JDK library methods were often inadvertantly hidden by the user of a 3rd party library because the 3rd party library unintentionally hid the JDK library method.  You could spend hours trying to figure out why a method call wasn’t working the way you expected only to find out you weren’t calling the method you thought you were.

    I’m all for convenience but if you want to change how a method is resolved without recompiling against the shipped library you can use assembly redirection.  You still have to compile against the redirected-to libraries but IMHO that’s a good thing (for all the reasons Eric mentions).

  30. Sean Lynch says:

    I actually had thought that base.Foo() called the method of the immediate base class (it is after all what I told it to do).

    Since I have never had things set up in the way described in your scenario, and all of my expectations are built off of how a call to BetaObject.Foo(); behaves, or having Alpha, Beta and Charlie in the same solution not from reading the spec. My expected behavior for swapping out Beta.dll, would be that base.Foo() would call ((Beta)base).Foo() not Alpha’s Foo.

    However, assuming that I have control over Charlie. I would have little problems with just recompiling the solution with Charlie.

  31. John Melville says:

    +1 for another programmer who thought the customer’s mental model was right.

    That being said, dynamically swapping out base classes is risky business, and I would not be surprised if it bit me.

    You admit in your article that there are two equally valid mental models, but imply that your belief is that most programmers expect the static model.  Based on the limited sample of your blog comments, that belief appears to be incorrect.  Would you reconsider your opinion if, for sake of argument, the compiler actually was surprising the vast majority of your developers.

    All that being said, fixing this doesn’t even come close to meeting the -100 point barrier for me.  Spend you time on a decent metaprogramming system.

  32. Robert Davis says:

    Unlike most of the commentors, I actually agree with Eric on this one. The customer controlled all 3 dlls, there is no reason not to distribute the recompiled Charlie.dll with the recompiled Bravo.dll. Compilers aren’t some big scary program (on the usage side of things), you just hit CTRL+SHIFT+B and magically dlls spit out in your output directory.

  33. Stuart says:

    @Robert: "The customer controlled all 3 dlls, there is no reason not to distribute the recompiled Charlie.dll with the recompiled Bravo.dll"

    That may be the case in this scenario but it’s certainly not always the case. If you’re the developer of a library, do you necessarily know to tell all your consumers that they have to recompile any time your library is updated? How are you supposed to patch bugs in your library if you can’t get every customer to recompile their code?

    Does .NET document rules for binary compatibility? As in "if you limit your changes to adding new methods and classes and … and … and … then existing code will work without recompilation"? I’ve always presumed "adding or removing an override to a virtual method" to be one of those binary-compatibility-preserving operations. Apparently it isn’t.

  34. Robert Davis says:

    @Stuart – no matter the language or operating environment, experience has told me, when in doubt, recompile. If I upgrade to a new version of a library, I’m going to recompile and test, and I would expect the same of people who upgrade to the latest and greatest of any library I wrote to do the same. Recompilation is cheap.

  35. Olivier Leclant says:

    Even if the object semantic is broken, it’s quite natural for a statically-checked language to make this optimisation. Findind at runtime which base method should be called would slow down each and every virtual call.

    Where I work, when we ship hotfixes, we just follow the rule of always delivering a fresh Charlie.dll when just Bravo.dll has changed. This ensures we can never run into this problem.

  36. Eric, I must with you.  Especially when there is such a simple way to get the customer’s expected behavior…

    public class Bravo: Alpha // In Bravo.DLL

     public override void M()

     {

       base.M();

     }

    }

    … if the above code was in Bravo before Charlie was compiled, then the virtual method call would work as he expected.

  37. I’m adding my voice to the chorus – the customer’s right, and the compiler’s behavior is quite surprising – surprising enough to be wrong.

    You say that base.M()is a non-virtual call – but that’s a misleading statement; it merely happens to be implemented as such currently.  The code is asking it’s _base_ class  (in this context, Bravo) to execute a method – not just any method, but a virtual method.  The fact that Bravo happens to have implemented M() by silently falling through to Alpha’s implementation isn’t particularly discoverable nor relevant – who cares where Bravo got the implementation from?

    The current resolution means that adding or removing code such as the following is a breaking change (even without reflection or whatnot):

    class A {public virtual int F() {…}}

    class B:A { public override int F(){return base.F();}}

    B’s implementation of F clearly looks like a no-op, and it’s extremely surprising that removing it changes the semantics of the program.

    Then there’s the problem of assymmetry:  You say that you don’t expect Charlie to call Bravo when Bravo suddenly implements M() – but the other way around does work – if you remove Bravo’s M() then, as expected, things just work – you don’t get an assembly load error complaining of a missing function, rather the superclasses method is chosen.  So, base.M() walks the inheritance stack upwards and picks the first M implemented compile time and then again walks the inheritance chain upward runtime – hardly a sane mental model anywhere.  So, is base.M virtual or not? it’s marked virtual, it behaves as a virtual call when you remove the implementation, but it behaves like a static call when you add an implementation – that doesn’t make _any_ sense.

    The safety argument rings hollow – if you’re meddling with various assemblies and overriding virtual methods, and somebody with the rights to rewrite a particular assembly does so, how is is surprising that new or changed implementations will be picked up?  That’s kind of the point of virtual methods in the first place.  If you don’t trust code, then don’t trust an inherited virtual method to do what it happened to do when you compiled it.  Using that as a security boundary is asking for trouble.

  38. Focus says:

    I dont agree with the customer model at all in this case, contrary to most here it seems.

    If you suddenly decide to override a method in a class that someone else down the line is consuming you are in a way changing the contract with said class, so you need to recompile not only the middle class  but anything that comes after.

    I find it much more logical and safe than the other behaviour.

  39. jsrfc58 says:

    Adam Robinson wrote:

    "While I haven’t yet encountered this issue, I know that I would have been utterly confounded by the fact that the compiler exhibited this behavior."

    And I would have to say, I would completely expect this behavior (oddly enough)…although I’ve never had to deal with it.

    Then again, I look at it from the point of view of having toyed around with building my own compilers in the past. To me, this sort of falls in the category of "it’s better to be safe and recompile" rather than assume anything about how things will be handled under the hood.

  40. Stephan says:

    I’m with the customer too. If M is virtual, I always thought that base.M() is a special virtual call that skips the implementation in the current class.

    If M is not-virtual, I expect it to be resolved statically. So, if someone adds a non-virtual M in Bravo, I’d expect the call in Charlie to still resolve to Alpha. In any case, newly introducing a method in Bravo that hides a non-virtual base class method although B is not sealed and has already been derived from sounds to me like bad programming practice. It would only be legitimate if it can’t break anything.

  41. Pavel Minaev says:

    With respect to your update, Eric:

    > How is that situation made any different from the customer’s perspective by making the non-virtual method call a base call instead of an ordinary non-virtual method call? Surely what is good for some non-virtual calls is good for all non-virtual calls, no?

    I think the point of contention is that base.M() is not obviously a "non-virtual call" – in fact, those opposing it would rather want it to behave like virtual. This is even explicitly mentioned in some of the comments above.

    The informal definition of base-calls that many (most?) programmers use is "call the method of the nearest base class". More formally, this would amount to "do a virtual call, as if the type of the object was that of the immediate base".

    So, as a question, it is, perhaps, narrower than it really should be – it makes certain assumptions that are themselves contested by those who’d answer it differently from you.

  42. Pavel Minaev says:

    "Just recompile" is not an answer in general, because all 3 assemblies may come from different vendors. Don’t forget that Charlie may also be declared in some library which your code is using, and for which no source code is provided.

  43. Stephan says:

    Focus wrote:

    "If you suddenly decide to override a method in a class that someone else down the line is consuming you are in a way changing the contract with said class, so you need to recompile not only the middle class  but anything that comes after."

    It depends. If the newly introduced override can’t break derived classes, i.e. doesn’t change observable state or behaviour in a way that matters to derived classes, then introducing a new override should be ok. I suppose that in the customer scenario everything would have been fine if the C# compiler behaviour had matched the expectation of the Bravo.dll developer.

  44. Andrey Titov says:

    I always thought that if I didn’t override some virtual Foo method it behaves _exactly_ like if I write "public override void Foo(){base.Foo();}". So if there is nothing to say more than call base method, I can safely delete this override, just like I can remove default constructor with empty body or add/remove for events.

    Actual behavior clearly creates absolutely not obvious problem for maintaining binary compatibility. Before I was sure that nobody can skip my methods and directly call base methods omitting call my overrides. Now I can’t be sure that my class invariants are preserved.

    It seems now I can get two different behaviors simultaneously: if this method is a part of interface and I call it via this interface or via instance of concrete most derived class. What happens with a call via middle class ((Bravo)new Charlie()).M()? I guess you will also get “Charlie / Alpha” and Bravo.M will be skipped even if there is clearly call to Bravo.M().

    I always read "base" as "virtual call, excepting current class and all types down to hierarchy".

    So this behavior is clearly a bug for me. I vote to change it.

    I see pros for current behavior: non virtual call is much faster, this is very rare case (but this case is looks quite possible), and there is no such "half-virtual" call instruction in IL (really isn’t?). But in first place this is wrong behavior, so it cannot be excused by this technical reasons.

    I want you change this. Either by replacing non-virtual call with "half-virtual" call or by forcing recompilation with IL verification to check isn’t there some overrides that are jumped over with non-virtual call.

  45. Quick extra note: some people noted that this is an optimization; if so, that optimization can still occur at assembly load time; there’s no need for the compiler to bother with it and you still retain the lower cost of non virtual dispatch at runtime.

    I’m also curious how fixing this bug is supposed to break existing code – It strikes me as very odd that it’s even _possible_ to call Alpha’s M() without Bravo’s consent – that’s certainly not easy (or even possible?) to achieve normally on a virtual function like that – so it’s very unlikely there’s any code relying on that very hard-to-trigger behavior – right?

  46. Daniel Grunwald says:

    To all of those who say: "recompile to be safe" – there are scenarios where you cannot recompile.

    1) Our application has a plugin model. Most plugins are rarely updated compared with our application, so it is important for us to allow old plugins to run in new application versions. We don’t have any source code for these plugins available. So we try to keep our application binary compatible with previous versions.

    This issue means for us that we are unable to ever add any overrides in any non-sealed class. This includes quite a lot of user controls (WPF and WinForms) where adding overrides to existing methods is considered normal. Plugins also overriding these methods expect "base.OnEvent(…)" to mean "handle that event the usual way", not "bypass the app’s usual handling and call WPF directly, breaking the app’s invariants".

    2) We’re producing a WPF control library. It runs fine on both .NET 3.5 and 4.0. However, as we’ve learned in this post, using a .NET 3.5 compiled library may break WPF’s invariants by calling the wrong base method (unless the WPF team avoided adding any overrides) – so from now on, we and our customers have to deal with maintaining two separate builds where one could suffice if this issue was fixed.

    In summary, this issue makes binary compatibility very, very fragile. I thought binary compatiblity of assemblies way an important feature of C#, but it appears I was mistaken 🙁

  47. df says:

    While I have tremendous respect for Eric, I have to agree with the majority of those who have commented here that the behavior is unexpected and probably undesirable (at the very least violating the principle of least surprise).

    First, is this behavior consistent with the spec, particularly the last paragraph of §7.5.8: "When a base-access references a virtual function member (a method, property, or indexer), the determination of which function member to invoke at run-time (§7.4.4) is changed. The function member that is invoked is determined by finding the most derived implementation (§10.6.3) of the function member with respect to B (instead of with respect to the run-time type of this, as would be usual in a non-base access). Thus, within an override of a virtual function member, a base-access can be used to invoke the inherited implementation of the function member."

    Second, the "evil hackers" argument seems like something of a red herring.  As you yourself stated in a previous entry: "When you call a method in Smith and pass in some data, you are essentially trusting Smith to (1) make proper use of your data[…], and (2) take some action on your behalf. "  If Bravo is compromised yet we trust it sufficiently to derive from a class defined within it, we already lost, no?  Besides, either the assembly is strongly-signed and is trusted (in which case we trust it to not be evil) or it isn’t (in which case nothing is stopping evil code anyway).

    Third, as an author of class C, if I call base.M() should I care if (or do I even have any means to know) whether M() is immediately defined by the parent class?  By not defining trivial overrides (protected override void M() {base.M();}), is B explicitly abrogating its ability to override virtual calls to M from derived classes?

    Fourth, the key difference between the example the compiler faced and the counterexample you defined in the epilogue is that of M() being a virtual method.  I think it fair as a developer to expect that a call to a virtual method can change at runtime to an implementation defined by a subclass, and to adequately prepare for this possibility.  For a non-virtual class, I would probably expect a call to (this.M()) to equate to the compile-time equivalent (in this case, ((Alpha)this).M()).

    It would seem that by calling a virtual method to begin with, we are already, as you state, "giv[ing] up safety, predictability, and the knowledge that what runs on my customer’s machines is what I tested".  If this is unacceptable, why would we swap out the DLL for a base class in this fashion to begin with?

  48. Stuart says:

    I was going to post a response to your update but Pavel’s post at 11:06am that starts "With respect to your update, Eric:" says almost exactly what I’d want to say. To elaborate slightly:

    In my mental model, a base call is not a non-virtual call: in fact, my mental model of C# doesn’t include a concept of a "non-virtual CALL" at all. My mental model is that non-virtualness applies to *methods*, not to *calls*. So in the scenario in your update, you are making a call to the non-virtual method Alpha.M(). In the original scenario, you’re making a call to the *virtual* method M(). The "base" prefix has a special meaning that indicates the virtual lookup should proceed from the immediate base class of the current class, but it doesn’t negate the "virtual"ness. It doesn’t turn a virtual method into a non-virtual method, just changes where the virtual lookup starts from. And "base" refers to the base class of the caller, which is to say that "base" means Bravo, not Alpha.

  49. Pavel Minaev says:

    > Before I was sure that nobody can skip my methods and directly call base methods omitting call my overrides.

    Note that this was never true, and will not be true even if this issue will be treated as a defect by C# team, and fixed for C#. The reason is that you cannot assume that other code running on .NET is written in C#, and so any limits imposed by C# compiler which aren’t also backed by CLR can be worked around by simply using a different language (or even CIL directly).

    With respect to calling virtual methods non-virtually, it should be understood that CLR allows such methods to be targeted by "call" instruction (which is for non-virtual calls). For non-verifiable code, _any_ accessible virtual method on _any_ type can be called non-virtually by _any_ other type!

    Verifiable code has additional restrictions, but those are actually in line with the present C# behavior:

    "When using the call opcode to call a non-final virtual method on an instance other than a boxed value type, verification checks that the instance reference to the method being called is the result of ldarg.s 0, ldarg 0 and the caller’s body does not contain starg.s 0, starg 0 or ldarga.s 0, ldarga 0.

    [Rationale: This means that non-virtually calling a non-final virtual method is only verifiable in the case where the subclass methods calls one of its superclasses using the same this object reference, where “same” is easy to verify. This means that an override implementation effectively "hides" the superclass’ implementation, and can assume that the override implementation cannot be bypassed by code outside the class hierarchy…."

    So, in short, even in verifiable code, non-virtual calls to virtual methods are okay on _any_ method, so long as the receiver ("this") is the same as for the calling method. So skipping levels of hierarchy is perfectly okay. Note how the rationale even acknowledges that by saying "… cannot be bypassed by code _outside the class hierarchy_".

    So there is no guarantee that any class invariants will be preserved, if they are maintained by virtue of overriding methods – any descendant can always ignore the override.

    > and there is no such "half-virtual" call instruction in IL (really isn’t?).

    There is. As noted above, if the IL specifies a non-virtual call to Bravo::Foo, even if Bravo does not itself declare the method Foo, the call will be correctly dispatched at runtime (to Alpha::Foo, or however high up the hierarchy is needed to find an implementation).

  50. Mark says:

    I’m going to have to go with Eric’s side.  With the given example we’re dealing with strings and it’s clear that the method in question is incorrect for both B and C, since the string returned shows the inheritance heirarchy.  In a more complex example though, you might have a calculation that for the incorrect version of B is correct for C.  So when fixing B you would actually break C.

    Regardless of whether or not another company makes C, it isn’t safe to assume that fixing B also fixes C, and it would be as likely that fixing B now breaks C.

    We’re not talking about tweaking the code.  We’re talking about adding a completely new function (on the level of B).

  51. Stuart says:

    Sorry, one clarification on my last comment, where I said my mental model is that there’s no such thing as a non-virtual call: I do know that the *implementation* in the CLR is that there *is* in fact such a thing as a non-virtual call, and that in the C# language non-virtual calls are used both for "base.Foo()" and for regular calls to non-virtual methods. But I consider that an implementation detail and not a reason to suppose that those two things should behave the same way.

    I consider it the job of the compiler to translate the C# language into CLR constructs in whatever way makes the most sense. The CLR lacks a "base call" primitive that would map exactly to my mental model of base.Foo(), so using a nonvirtual call to the base class method is obviously the way to implement that. But the desired behavior should be what drives the implementation, rather than the implementation as a nonvirtual call driving the choice of behavior.

  52. Marcel Popescu says:

    As Ayende has said a while ago, when most customers have a different mental model than the Microsoft team, it’s the Microsoft team that should change how things behave. Being Ayende, he actually got the MVC team to make that change 🙂

    Note: this is the second time I see Eric have a "weird" mental image of what should happen; the previous one was regarding static virtual methods (also known in Delphi as class virtual).

  53. Pavel Minaev says:

    @Mark:

    > I’m going to have to go with Eric’s side.  With the given example we’re dealing with strings and it’s clear that the method in question is incorrect for both B and C, since the string returned shows the inheritance heirarchy.  In a more complex example though, you might have a calculation that for the incorrect version of B is correct for C.  So when fixing B you would actually break C.

    This interpretation is inconsistent with proper use of virtual & override. The class that introduces a virtual method specifies the contract for that method, which all derived classes are expected to adhere to. If they do not, they violate LSP, since a client could call the override via a reference to a base class type without even knowing it; he has the right to expect the contract to be upheld regardless of the effective type of the referenced object.

    Consequently, whether B overrides A.M or not, the implementation of C can expect that base.M – regardless of what it ends up calling – will uphold the contract for A.M. If the newly introduced B.M does not do so, it would break far more than this particular edge case – it would break any client code that makes virtual calls to A.M in a situation where B may be the effective type of receiver.

  54. Stuart says:

    Just to make the distinction even clearer: If the new version of Bravo looked like this:

    public class Bravo : Alpha {

     public new void M() { … }

    }

    ie "new" instead of "override"…

    THEN I would expect that Charlie’s base.M() should still end up calling Alpha.M() instead of Bravo.M(), until Charlie got recompiled.

  55. arnshea says:

    If you call M() from a Charlie type reference (e.g., Charlie c = new Charlie()) then later drop in a recompiled Bravo.dll you won’t get the new Bravo.M() call.

    To get that, you have to use a Bravo type reference.  If you call M() from a Bravo type reference (e.g., Bravo b = new Bravo()) then later drop in a recompiled Bravo then you will execute Bravo.M() followed by a call to Alpha.M().

    Lastly, dropping in an override in Bravo doesn’t Bravo.M() virtual.  The only method that’s virtual is Alpha.M().

  56. Andrey Titov says:

    Also I want to add that if you says base.Feed in Giraffe you might not think that you call Animal.Feed. Instead you should think: my base class is a black box (generally you have not access to it’s code) and I call realization of this method that is provided by this black box, and because I really don’t know how this box provides this method or even if I know, I can’t rely on particular realization, so I ask this box to handle this call anyway. You might ask only this box, the class one level up in hierarchy, not any other class upper. And it can do with this call whatever it wants: provide it’s own realization or forward this call to it’s base class by omitting own realization, but you don’t need to care about this.

    So from my point of view "base" is "one level up class name substitution (exactly that class that I explicitly derived from)", and I didn’t know and don’t care about what’s going on upper this type.

    If I correctly understand, from your point of view "base" is substitution of name of class that really implements particular method, so actual substitution depends on method name after the dot. So why then we have no syntax to access any base class (or just any class) method and make non virtual call to it?

    I guess actual behavior of "base" is correlated with fact that actually I cannot have  any (direct or indirect) base type less accessible than my class, and all information about witch class overrides witch method is exposed and public. So the fact has the type override particular method or not is the part of it’s public contract. And that’s the thing that I cannot agree. I think it should be absolutely equivivalent for public contract of your class does you override method or stuck with base implementation by omitting your own implementation or stuck with base implementation by providing one line override that just calls base implementation.

  57. Leo Bushkin says:

    I would like to share my two cents. Before I do, I will say that a strong case can be made for both positions of this issue. Furthermore, it’s entirely unclear what the appropriate or most desirable behavior should actually be, and given that there are potentially significant consequences to the large body of deployed code already shipped, a lot of thought (by folks smarter than me) needs to take place before such a change should be made. That said…

    After reading the article and your (Eric’s) explanation of what happens, I have to say I was surprised. My "mental model" leaned in the direction that the customer, and many other responders, have described – namely that base calls in a virtual method send the message up the inheritance chain, and don’t skip directly to the base version that existed at compiled time.

    Second, similar to the topic of how the stack works in .NET/C#, I’ve always treated the mechanism of the base call as being an implementation detail – an abstraction that I didn’t have to worry about. However, in this case, it turns out to be a leaky abstraction – you have to know a great deal about how the compiler wires a base call to be able to design and implement inheritance hierarchies that behave correctly – particularly in special cases like runtime substitution of compiled code. I have to say that I find that a bit unsettling because the vast majority of C# developers likely do not have this level of understanding about how their code is translated into IL.

    Third, I now realize that there are potentially a large number of cases where this type of problem can be introduced. Take, for example, the .NET framework itself. It’s quite common to inherit from classes in the .NET BCL – these class can, in turn, inherit from yet others within the BCL. Now imagine a case where some user-defined class inherits from a version 2.0 .NET class – but at runtime, the compiled code is loaded into a different version of the CLR than what the code was originally compiled against (there are numerous cases where this actually happens in real world code) and dynamically bound against different versions of the BCL assemblies. It’s entirely plausible that the BCL code itself has added some overrides that may have previously been omitted. Consequently, at runtime the behavior continues to call the wrong overload. Another example would be service releases of the .NET framework itself, which could introduce overrides of calls within the inheritance model that didn’t previously exist. Most shipped applications are not going to get recompiled when a service pack is deployed … nor in most cases is that even possible. It seems that the authors of the .NET BCL will have to be careful in the future to be aware of such impacts.

    Fourth, I don’t view base calls and overload resolution as the same thing. In the update to the article, you mention that swapping in a M(int) override in Bravo would exhibit the same type of problem as the virtual method’s base call does.

    Fifth, I am surprised that this should be such a rare case. It would seem to me that this behavior should occur more frequently than it does. The fact that this isue hasn’t seen been more common and impactful is hard to reconcile with my intuition about the frequency that should a change could be introduced and the amount of existing, compiled code out in the real world.

  58. The argument about "calling the code I tested against" is invalid. If you want that (and you probably do want that), you need to sign your assemblies(*). If you don’t sign them, _all_ calls into an assembly are calls into the unknown, not just base calls.

    Secondly, many people see turning virtual calls into non-virtual ones as an optimization, that shouldn’t change semantics. For example, if a method in a sealed class calls a virtual method in that same class, the call can be optimized into a non-virtual call, because both methods are in the same module. Calling a virtual base method can be optimized statically only if the base method is in the same module. If not, the optimization should be postponed until runtime, when all information is known. Note that the JIT can (and should) do the optimization, such that there is a cost (if any) only on the first call.

    (*) Even signing doesn’t give you this protection. I can easily recompile an assembly, even when modified, to have the same strong name as the previous version. In fact, I do that regularly when my changes don’t break the API (for example, they’re bug fixes or optimizations). In fact, I use that as a feature. See also http://blogs.u2u.net/kris/post/2007/07/20/Versioning-NET-Assemblies.aspx.

  59. Mark says:

    @Pavel:

    I see your point, I hadn’t thought that far into it, but after posting I was trying to come up with an real-world example to what I was saying and couldn’t come up with anything that wasn’t hacky at best.

    With that said, I think I’m swaying sides a bit on this, though I think Leo said it well that it’s unclear what’s really desirable.

    When it comes to why this hasn’t been seen more often with releases of .NET, I would assume it has to do with the assembly manifest pointing to the specific version of .NET an assembly was built against.  I don’t completely know what the default is for this though.

  60. orangy says:

    " I want something on some base class to be called."

    I wonder, if customer introduces new Bravo.dll where base class of Bravo is no longer Alpha, but still has same method with the same signature. Does he expect it to hot-swap into entirely different hierarchy?

  61. >Without recompiling Charlie.DLL, should Charlie.DLL now start calling Bravo.M(int) ?

    No it should not. Here’s why:

    In my mental model, the phrase "calling a method" is an abbreviation for "sending a message to an object, that reacts by executing a method". That’s were the word "method" comes from: it’s the objects method of responding to a message.

    Overload resolution is about determining the message to send, not the method to execute. Overload resolution _conceptually_ happens at the call site, when compiling the call. Choosing the method to react to a particular message _conceptually_ happens at the receiving site. As the receiving site may have been compiled before or after the calling site, the call site cannot statically use any information about it, except when the receiving site is compiled together with a call site, because it is in the same module.

    Note that the same is true for inlining a call. A hypothetical super-optimizing C# compiler would be allowed to inline a non-virtual call (before compiling to IL that is) if and only if the callee is compiled into the same module as the caller.

    Another point of view, compatible with the previous one, is that overloads are actually different methods, with different messages (message = signature + parameter values). It’s like the good old C++ compilers, that use(d) name mangling to enable overloading method names. I think of overloads as methods with different names.

    To complete the point: virtual methods are very different from overloaded methods, as a virtual method and its override are two different methods reacting to the same message (or slot in the VMT), where two overloaded methods react to two different messages (and if they happen to be virtual, have two different slots in the VMT).

  62. np says:

    To commenters:

    1) Whether you agree or not, other people may depend on the current behavior without even realizing it.  They would be quite surprised if their code stopped working and required them to change their entire inheritance model.

    2) There are many other ways to accomplish the desired behavior

    3) Do you really want your code to depend on such a subtle and misunderstood detail? Wouldn’t it be better if such an important architectural detail were spelled out explicitly?

    4) If you really can’t recompile, see point 2

    5) To those of you who say Microsoft should change to match the customer’s mental model, well, other customers may have different mental models than you.  Please don’t break my code because you think yours is more important.

  63. Tim Goodman says:

    Eric wrote:

    > One of these days I need to sit down and just read all five hundred pages of the C# 1.0 and 2.0 design notes.

    And share all the interesting parts with us in a lengthy series of blog posts, I hope. 🙂

  64. Darren Clark says:

    Count me in on the "surprised" crowd.

    If an assembly is not strongly named, then it is possible and allowed to run against a version that is different from originally compiled. Heck, even if _is_ strongly named you can do it. The scenario here is a library upgrade/patch. Yes, if it is your code you should recompile and retest against the new library. But what about the case of a library that is used by another library, or an executable? This is a valid scenario.

    Steve Bjorg hit it on the head with the example that clearly shows it is the wrong behavior. I have a system that uses a 3rd party library(hardly uncommon in open source tool installations). Unbenownst to be (as a customer) the implementor derived from a base in the library.

    Example:

    LibraryA has the following:

       public class LoggerBase: IDisposable

       {

           protected virtual void Dispose(bool disposing)

           {

           }

           public void Dispose()

           {

               GC.SuppressFinalize(this);

               Dispose(true);

           }

       }

       public class FileLogger : LoggerBase

       {

           private FileStream file;

           public FileLogger()

           {

               file = new FileStream("Log.log");

           }

           //Probably should implement dispose, but we’re going to fix this class later anyway.

       }

    This is part of a framework used by some build system I have or something, and they inherit SpecialLogger from FileLogger.  

    Then the library implementors decided that using a managed resource was too "expensive", and decided to go an unmanaged route.  They put out a new version with:

       public class FileLogger : LoggerBase

       {

           private int filePtr;

           public FileLogger()

           {

               filePtr = AllocateUnmanagedResource();

           }

           protected override void  Dispose(bool disposing)

           {

               ReleaseUnmanagedResource(filePtr);

           base.Dispose(disposing);

           }

       }

    If I patch my library, then I start leaking handles because only HALF of the modified code is executing. In what way is that _ever_ correct behavior?

    "Surely what is good for some non-virtual calls is good for all non-virtual calls, no?"

    This is really the crux of the different viewpoints. This question doesn’t even make sense to me, because until now I assumed that base.Foo() WAS virtual. I guess to you it makes sense that it isn’t, because that’s how it was done.

    So of course the new non-virtual M(int) shouldn’t be called. It wasn’t a virtual call, it was a non-virtual call to Alpha.M(object). Somehow I don’t see the conflict here.

  65. jsrfc58 says:

    Couldn’t this unusual possibility also occur?

    public class Bravo: Delta // In new Bravo.dll, now pointing to a new base class

    {

     public override void M()

     {

       Console.WriteLine("Bravo");

       base.M();

     }

    }

    …where Delta looks like…

    public class Delta

    {

     public virtual void M()

     {

       Console.WriteLine("Delta");

     }

    }

    Now what happens?

  66. Yogi says:

    Eric,

    This has been a fascinating discussion. Is there any way to share these design notes? I am sure these notes would be far more interesting read than the spec. I know, I know, you will not be able to share those design notes, but it wouldn’t hurt to ask, right? 🙂

  67. Stuart says:

    I see the problem with the update. And I see why it’s not a problem for Java – there’s no "new" or "override" keyword so it’s impossible for a method inserted into the hierarchy with the same name and signature NOT to be an override.

    I’m not sure I agree with the tradeoff; I’d rather have a crash in the case where someone introduces a new private method with the same name as an override, than have the wrong behavior silently succeed. But at least I understand the reason for the tradeoff.

    And it does, indeed, seem to be one without a good solution, unfortunately. I’d argue for the new CLI opcode, but obviously the C# team tried that and failed. Perhaps a compiler option to switch to calling Bravo.M() even if it meant that certain kinds of changes to Bravo might cause the code to crash? To my mind, "certain kinds of changes to Bravo might cause my code to crash" is better than "certain kinds of changes to Bravo might cause my code to completely ignore all the invariants that the author of Bravo was trying to enforce, but still continue to run without giving any error messages".

    The only other solution I can come up with is to have a way for the compiler to automatically insert a do-nothing stub for every non-overridden base class virtual method, equivalent to override M() { base.M(); }. But that’d enlarge the code quite a lot and mostly unnecessarily.

  68. Leo Bushkin says:

    Eric,

    I am always impressed by the thoughtfullness and thouroughness of the C# compiler team – the fact that they identified this issue in the early days is impressive – and the complexity of the problems that have to be coordinated between the CLR team and the language teams is also remarkable. The insight this kind of discussion provides into language and runtime design is wonderful, thank you.

    One question that occurs to me though.  When adding a private new M(), why can’t the compiler also generate an overriding M() that pass-through calls base.M()? Presumably this would allow the CLR at runtime to correctly select between the two. I understand that normally (in user-written C# code) you can’t both override a method M() and supply a private method M() directly, but what prevents the compiler from being able to do so? Presumably the private/override versions of M() are placed in different "slots", yes?

  69. Gabe says:

    If I understand the behavior correctly, it used to work as most of us expect. However, swapping out the Bravo assembly with one that has

    public class Bravo: Alpha

    {

     new private void M()

     {

       Console.WriteLine("Bravo");

     }

    }

    would cause your program to crash unless you recompile. Then they fixed it to make that scenario work, but broke the scenario where a maintenance release (fixing bugs, patching security holes) adds an override to a class that didn’t have one before.

    So instead of an extremely unlikely situation breaking my program and crashing, an unusual, but still less unlikely, situation silently breaks my program’s invariants with no way to debug? I think I like the old behavior better.

    I don’t see "new private" methods as something you add to a class; I see them as something you already had in your class, but you had to add the "new" only because a base class was modified to include a new method of the same name. If I’m the author of Bravo class, and I already know that my base class Alpha has a method M, why would I create a new private method M? Unless I really wanted to confuse all maintainers of the class, I would pick a name for the method that doesn’t have the same name as an unrelated method of the base class.

    Am I missing something here? How often does somebody add a new private method to a class that has the same name as a virtual method of a base class?

  70. Ben Voigt says:

    Aren’t all the calls we’re discussing made using a particular mdtoken representing the target method, and not by name?  So I’m a little surprised that there exists an mdtoken for B::M() when B didn’t override A::M(), and I’m doubly surprised that the metadata token which represents that potential B::M() is structured in such a way that it could match "B::new M()" instead of "The implementation of A::M() in class B" which clearly is that sort of "truncated search virtcall" that fixing this problem requires.  In fact, use of Reflection.Emit certainly led me to believe that the metadata token was of the latter style.

    In fact, it sounds like the CLR could provide the desired behavior without a new instruction.  I propose that if you upcast arg.0 and then do a virtcall, you ought to get the "truncated search".  Currently you’d do this to call the most overridden method prior to a "new" method, it seems like it shouldn’t break any existing code to define the truncated search if this is used in the absence of a "new" method (and then it continues to work if a "new" method if added by an update to the assembly defining an intermediate parent class).

  71. Pavel Minaev says:

    > Whether you agree or not, other people may depend on the current behavior without even realizing it.  They would be quite surprised if their code stopped working and required them to change their entire inheritance model.

    Note that this works both ways – there may also be existing code which is written using the "wrong" mental model, and is therefore buggy, without people who wrote it knowing about that.

  72. Pavel Minaev says:

    @Ben

    > Aren’t all the calls we’re discussing made using a particular mdtoken representing the target method, and not by name?  So I’m a little surprised that there exists an mdtoken for B::M() when B didn’t override A::M()

    Have a look at Ecma-335 describing "call" instruction. I had cited the relevant paragraph in an earlier comment, but here it is again, for convenience:

    "If the method does not exist in the class specified by the metadata token, the base classes are searched to find the most derived class which defines the method and that method is called.

    [Rationale: This implements“call base class” behavior. end rationale]"

    >  and I’m doubly surprised that the metadata token which represents that potential B::M() is structured in such a way that it could match "B::new M()" instead of "The implementation of A::M() in class B"

    There is no difference between the two. On MSIL level, a non-newslot method named M will override any method M with a matching signature inherited from a base class. Thus, a token for B::M always references M in B, whether it overrides anything or not.

    .override is a different thing, but it’s orthogonal to all this.

    > In fact, it sounds like the CLR could provide the desired behavior without a new instruction.  I propose that if you upcast arg.0 and then do a virtcall, you ought to get the "truncated search".  Currently you’d do this to call the most overridden method prior to a "new" method, it seems like it shouldn’t break any existing code to define the truncated search if this is used in the absence of a "new" method (and then it continues to work if a "new" method if added by an update to the assembly defining an intermediate parent class).

    It seems to me that it would break any existing IL that presently contains no-op upcasts on ldarg.0. While the upcasts are meaningless, such IL is perfectly valid, and has a well-defined meaning alread.

  73. Darren Clark says:

    Hrm… so the behavior was as a lot of us expected, but a really obscure case would crash the CLR. So a breaking change to C# compiler was made in 2003.

    I really don’t see how this would have been a breaking change in the CLR. After all, a compiler would still be free to make any sort of call they wanted. It would really be a "do a virtual dispatch call on object X, treating it was type Y for method lookup". Could be a useful instruction in other cases.

    It seems that would have been a not to difficult and fairly safe _addition_ to the CLR, in order to prevent a breaking change to the compiler.

    Weird.

  74. carlos says:

    The current compiler behaviour can still cause runtime crashes.  Suppose you have four classes in the hierarchy.

    Insert an extra class Aleph in the hierarchy between Alpha and Bravo and give Bravo an override of method M.  When Charlie is compiled the call to M goes to Bravo.  In a future update Bravo loses its override of M and, at runtime, the call to Bravo.M ends up at Alpha.M and everything still works.  But then another update comes along adding a private method M to Aleph.  And the code crashes at runtime.  But maybe this is a little far-fetched.

    P.S. I know the blogging software isn’t your problem, but the captcha image doesn’t appear in FireFox.  Since it’s a JPEG with an aspx extension I’m guessing the server isn’t sending the correct mimetype.

  75. TheCPUWizard says:

    This has been a very interesting discussion, and I am suprised by the number of people who have not been aware of this. Obviously, the majority of readers (or at least posters) are not aware of the basic tenet:

    "If you develop a NON-SEALED class, you MUST test the implementation via (often multiple) derived classes."

    To minimize the impact, one technique is to override every virtual method that you may possibly want to change the implementation at a later day with a simple call to base. [We actuallly do this via our automatic "class setup" wizard].

  76. Pavel Minaev says:

    @carlos:

    > in a future update Bravo loses its override of M

    Well, removing members has always been considered a versioning-breaking practice, so this isn’t anywhere as surprising as Eric’s original case (which is merely adding a non-virtual member).

  77. Sandro says:

    The customer is right, Eric is wrong. Eric’s solution breaks separate compilation, and it breaks the simple OO delegation model, both of which are unacceptable IMO.

    I’m also surprised that it would be so difficult to implement this in the CLR. All the metadata required to resolve the base class call is available to the VM in the CIL. If derived classes indeed cannot access private members, then that member should not be in the publicly accessible vtable, so resolution isn’t a problem. If it must be in the vtable for some odd implementation reason, then add protection information, ie. private/protected, to the slot resolution process so it skips protected members.

  78. Gabe says:

    This is the basis for a great new interview question:

    What are the semantic differences between classes Bravo1 and Bravo2? Which class definition is better and why?

    class Bravo1 : Alpha {}

    class Bravo2 : Alpha { public override void M() { base.M() } }

  79. James H. says:

    Eric,

    With all due respect, I think that you are wrong.

    This behavior is quite disturbing.

    So much that I would even delay VS2010 just to fix it, although I know that it is *not* realistic. (SP1 would do fine.)

    I am rather concerned about the affect that this issue may have on my code.

    I agree with most of what the other commenters are saying here, so there is little need to repeat.

    Request: At the very least, you and your team need to sit down and really discuss this issue in depth.

    Please feel free to solicit us for more discussion. I am sure many of here feel passionately about this issue and would like to be part of the discussion / solution.

  80. Duncan Kennedy says:

    "if you change a base class then you should recompile your derived classes"

    That doesn’t feel right.  My base classes exist as a rock for me to derive reliable, predictable and very well understood behavior from; they help me form version 1 of my application and when version 2 comes along they are often the favourite candidates for extension.  Some of these base classes are used outside of the core team and are deployed in applications that are not interested in the new functionality, but would like the latest base classes for the various fixes and rewrites in the classes and methods they actually do use.  They aren’t very interested in rebuilding redistributing their derived classes though!

    In essence, how does the quoted sentence stack up against design by base class vs design by interface?  The whole "Framework Design Guidelines" angle is what I’m lining up against here.  Suggesting that assemblies containing derived classes should be recompiled when the base class changes doesn’t feel practical in many, by no means alll, or indeed most, but certainly a significant number of, situations.

  81. DevinB says:

    Two separate things

    First of all: Regarding Eric’s "being wrong"

    There are two solutions I can think of.

    1) Runtime check on every "base" call made ever. Every single one.

    At that point, you have successfully mitigated the purpose of having compiled dlls in the first place. You have carefully avoided the benefits of compilation, and in order to fix a very obscure scenario.

    2) All classes which inherit from classes with virtual methods ANYWHERE in their heirarchy (this is all classes, by the way) will automatically generate

    public override VirtualFunction(params) { base.VirtualFunction(params); }

    That would solve the issue, but again, when you call ToString() on anything, you would end up with a very tall stack trace, which would obviously have performance implications.  This would have a measurable impact on ALL programs.

    3) Check if the reference dlls have been modified

    When check? Store the information where? What do you do if it HAS been modified? Dynamically recompile my own dll? What if there are irrelevant breaking changes?

    This solution is impossible.

    Ultimately, Eric’s solution is the most pragmatic.

    Secondly,

    If after adding

     public override void M()

     {

       Console.WriteLine("Bravo");

       base.M();

     }

    to Bravo, which doesn’t get executed, you then choose to SEAL method M, it will cause a runtime error, why is that?

  82. Gavin Greig says:

    I’m surprised that the opposition to this appears so strong. While I understand the concerns that some people have about libraries they don’t have source for and particularly plug-ins, those are inherently risky situations anyway. If they break they break; and you should be prepared for it as part of your risk assessment. You don’t have to be happy about it, but you should be prepared.

    I don’t think those concerns should override common sense; by default, compiled code should not change its behaviour when a change like this occurs – when completely unknown code is inserted where no method existed before. That should only occur after the new method’s been explicitly approved through recompilation (and retesting, etc.).

  83. Marcel Popescu says:

    Gavin, the problem is that this breaks *silently*. Breaking explicitly is always better than breaking silently.

    Security example: method M() validates access to an account. Alpha.M() checks some conditions. Later on, Bravo.M() adds another layer of checks – WHICH IS IGNORED. In what way is this a good thing?

  84. Gavin Greig says:

    Marcel, fair point, and you might have convinced me (I’ll have to think a bit about it, but my first inclination is certainly to agree with you). However, that doesn’t seem to be the primary concern of most of those commenting, who want stuff to "just work" in line with their expectations, even though it’s been pointed out why that’s not a good idea. Maybe failing to crash is also a bad idea.

  85. v.reshetnikov@gmail.com says:

    I wonder, what behavior the customer would expect, if Bravo were modified as follows:

    public class Bravo: Alpha

    {

     public void M(params int[] x)

     {

       Console.WriteLine("Bravo");

       base.M();

     }

    }

  86. Focus says:

    I’ve been thinking about the issue a bit longer and I think I have changed my mind…the customer is right.

    The main reason is that the code is "failing" silently to do what the coder expects. The obvious argument is that the coder should know how the compiler works but one of the main tenents of the c# team has always been to create a language that does what one expects. And seeing the vast majority of posts here, the compiler is failing big time and doing what no one expects silently.

    Eric uses security as an argument but plenty of posts have also argued that the current implementation can be as insecure or even more. If Bravo was to implement a new layer of data access security, the client could actually be fooled into a false sense of security as the new implementation in Bravo is ignored.

    Obviously when all dlls belong to the same developer the current implementation isnt a big deal. But when more than one developer is in the mix, things dont seem to work as well.

  87. jsrfc58 says:

    Gavin Greig wrote:

    "I don’t think those concerns should override common sense; by default, compiled code should not change its behaviour when a change like this occurs – when completely unknown code is inserted where no method existed before. That should only occur after the new method’s been explicitly approved through recompilation (and retesting, etc.)."

    Agreed. And maybe that is where the confusion is coming in…I would expect my code to run properly IF we were talking about an interpreted language and I swapped some new code into a function. With compiled languages, my expectations are different. Although I have not done much work in Java, I would expect the JVM to more or less function like an interpreted language (in this respect), whereas with the .NET framework, I would expect it to function more like a compiled language. Yes, both platforms use bytecode, but the way the bytecode is handled is different.

    To quote Wikipedia:

    "On .NET the byte-code is always compiled before execution, either Just In Time (JIT) or in advance of execution using the Native Image Generator utility (NGEN). With Java the byte-code is either interpreted, compiled in advance, or compiled JIT."

  88. As a long time C++/recently turned C# dev I find this somewhat assuming. In C++ there is no base/super keyword and so you will likely write Alpha::M()  – although you could write Bravo::M().

    A common reaction to the brittleness of this (with respect to changes in the class hierarchy) is to typedef up an alias so you only have to change one thing when the class hierachy changes:-

    private:

       typedef Bravo base;

    [See http://stackoverflow.com/questions/180601/using-super-in-c]

    Thinking about how this actually behaves in C++, Eric’s point make sense, but when considering the reason why this idiom is employed in C++ I think it matches the mental model of the majority here.

  89. Pavel Minaev says:

    > As a long time C++/recently turned C# dev I find this somewhat assuming. In C++ there is no base/super keyword and so you will likely write Alpha::M()  – although you could write Bravo::M().

    Two things of note here.

    First, there is __super available as an extension in VC++.

    Second, in the example that Eric gives (rewritten for C++/CLI), even if you use Bravo::M(), the actual call compiled by VC++ to MSIL will still be to Alpha::M().

  90. Pavel Minaev says:

    > There are two solutions I can think of.

    > 1) Runtime check on every "base" call made ever. Every single one.

    You miss one obvious solution: a single check at JIT time. It really only has to be done once per class load, because the entire inheritance chain for a given class is known at that point, and thus it is possible to determine, for any hypothetical "basecall" instruction in a body of a method of the class, which method it actually refers to. Once thus determined, it won’t change later at runtime.

  91. William Schubert says:

    Suppose you have a bug fix you inserted into Bravo, that makes Charlie work correctly.  Charlie was created by the Type 2 team at a distant location, or there could be several different versions of Charlie   created by various type 2 teams around the world.  Are you saying that all of them have to recompile their Charlie because we added a bug fix implementation in Bravo?

  92. Darren Clark says:

    @Gavin

    In addition to silently failing in cryptic ways, it is also possible to replace _some_ of the code in a base class and to have some other code not execute. I still don’t see how this is ever acceptable. Either fail to load, or load, but don’t half load.

  93. BW says:

    I’m impressed. Sounds like a sensible way to avoid (the .net version of) DLL hell.

  94. Mark Knell says:

    > just pick the best one – whichever one happens to be "most derived" in some future version of the world – by doing that analysis at runtime

    I propose we refer to this performance penalty as an "extra base hit".

  95. Richard says:

    I can see the arguments for both sides, but I tend to agree with the customer (and the majority of comments). Having to recompile *every* bit of .NET code every time there’s a service pack or security patch is not an acceptable option; neither is manually inserting no-op overrides of every virtual method from your base classes "just in case".

    If Bravo v2 breaks Charlie v1, you’ll always need to recompile. With the pre-2003 compiler, non-breaking changes wouldn’t require a recompilation; with the current version, they *might*, and you have no way of knowing until things start breaking in interesting ways.

    If the base method call works 99% of the time when Bravo.dll is replaced, and crashes 1% of the time when BravoCorp do something stupid, surely that’s better than not working quite as you expect 99% of the time and the other 1% also not quite working as you expected (but not crashing).

    In other words, if there’s something wrong, it’s better to crash than to do the wrong thing.

  96. Mario Cossi says:

    I’m completely with Eric on this. If you change a component you should recompile and re-test every component that references it – directly or indirectly.

    Even if we didn’t agree on this, changing the compiler as the customer requires would not achieve anything: there are many other situations in which our code would go berserk if we just swapped a DLL without recompiling. Constants and enumerations come to mind… maybe they can be fixed too, to a point, but I cannot imagine how that might happen without giving up at least constant folding in the process. No, thanks.

  97. Patrick says:

    I read the article but skimmed through the comments so if this question has already been asked I apologize.

    What happens if the order of operation is changed just a little. What if M exists in Bravo at the time that Charlie is compiled. Then M is removed from Bravo, recompiled (without compiling Charlie).

    I tried this and got Alph / Charlie once I recompiled Bravo. Is this the correct behavior?

    If this is correct then the scenario the "customer" wants can be done, through a strange work around but it can be done.

  98. Stefan Rusek says:

    Wow, I was annoyed when I read your article the first time, and now I am down right angry. You give advice that totally contradicts the entire versioning model of the CLR. You said:

    "If you change a base class, recompile your derived classes. That’s the safe thing to do. Do not rely on the runtime fixing stuff up for you when you hot-swap in a new class in the middle of a class hierarchy."

    Yet one of the core features of dotnet is that it is generally backward compatible. In fact the CLR team seems to take pains to keep it backward compatible. (A bunch of stuff changed from 1.1, and some backward compatibility was broken, but the majority of my 1.1 code and dlls run fine on 2.0 and later.) It would be absolutely absurd if we had to recompile everything after each time the CLR is updated or patched, something that just can’t be done without modifying base classes.

  99. Anon says:

    Intuitively I side with the customer, but practically I side with the status quo.

    Is it counter-intuitive? Yes.

    Should the JIT’er do the base resolution? Maybe

    BUT … who in their right mind would ever just chuck a new dll without testing in the first place? In my entire life I’ve never had an update from a 3rd party that didn’t break some existing code. Updates from your own team are worse as they generally don’t follow any sort of rigor when designing assemblies.

    Nobody said creating API’s was easy, and how many project dll’s do you have that are 1.0.0.0?

    It does beg the question though why the runtime lets you do this in the first place if it’s so dangerous / error prone. If everything truly is that dependent then the runtime should kill it instead of allowing it to continue. Maybe saying the checksum of bravo.dll is 123 but on disk is 456 -> Fail.

    Maybe the solution is to have an attribute on the class that will instruct the compiler to create stub methods for all overrides. That way (assuming there are no other breaking changes) if you need to replace Bravo.dll, when compilation occurs, the compiler can say "OK I’ll give you stub methods so all base() calls from your descendants will always go to you, which lets you do this replacement scenario.

    At least that way the CLR doesn’t change, the compiler has minimal change, it’s very explicit, and the safe/good defaults stay in place.

  100. Rayviewer says:

    The question is where the “middle” is. Is it within MSIL, or JIT/NGEN?

    An object using polymorphism has a hidden “vtable”. When to bind the vtable to the object is the key to confront both sides.

    Eric side wins if people agree the vtable should be build when C# compiler is compiling code.

    Eric side loses if people agree C# complier just making the MSIL. JIT or NGEN should do the binding based on those managed DLLs are used.

    Personally, I like the first mental model at work. But from backward compatibility point of view, I support the second mental model, because that is the unmanaged C++ vtable should look like, and by that time, there is no unknown there.

    I think the ball should roll back to the CLR team. Or someone should redefine what “virtual” means in .net framework instead of adopting the “virtual” from C++ if the problem is too large to be addressed there.

  101. Jan-Olof Stenmark says:

    Exciting reading as always in your excellent blog Eric.

    I fully understand that this topic is not a clear-cut question with a "everyone is happy – no problems whatsoever"-answer.

    I thought that I would look at a another real life scenario. (Apart from the issues we have with this in a product in our company.)

    I looked at the service pack changes made in one of the WPF-assemblies by Microsoft.

    I compared the 3.0.6920.0 version (the version you get if you installed the original 3.5-framework) of PresentationFramework.dll with the version I get if  I update it using Windows Update (3.0.6920.4902).

    There were some methods and properties that have new overrides in the new service pack version. Some of the methods were:

    System.Windows.Controls.TreeViewItem: MeasureOverride

    System.Windows.Controls.VirtualizingStackPanel: OnGotKeyboardFocus and OnLostKeyboardFocus

    System.Windows.Documents.Table: BeginInit and EndInit

    I haven’t checked what code was added in the functions mentioned, but I suspect that at least some bad things will happen if it is not executed by derived classes that was compiled with a version earlier than 3.5SP1.

    Keeping binary compatibility is not easy. You constantly have to make decisions what changes can be made, and what impact will they have. Like some of you have mentioned before me, there are some things you can change and some things you can’t, and still have it binary compatible. But not being able to add new overrides in classes and be somewhat confident that the base-code will be executed (by the standard base-call), severely limits the things you can change.

    Apparently at least the developers of the WPF seems to think that, since they have created new overrides on some of the classes in a service pack. Or maybe those changes were necessary, and they estimated that not that much derived code would break by releasing the service packs. (My guess is they didn’t think they were introducing problems just by overriding those methods.)

    Of course, this doesn’t mean that this behavior has to change. It’s just one more example of what some (and I go out on a limb and say, most) developers expect from the "base-call". The solution of course is, when in doubt, recompile. But that equates to: "Everytime Microsoft releases a new hotfix or service pack, recompile and update all installations of the application." That is not a very preferable scenario.

    Disclaimer: I compared the 3.0.6920.0-version from a Vista-machine with the 3.0.6920.4902-version on my Windows 7 machine. Since I haven’t looked at the actual code added in the SP, all the changes they made, can possibly be of the type that doesn’t have to be executed. Or they expected the changes to be breaking changes. My analysis of the assembly may also be faulty. Then, my apologies to the WPF-team, for thinking that you didn’t fully understand all the details of how the compiler/framework worked.

  102. np says:

    I don’t understand these comments.  The majority seems to be saying "I want to be able to change the method signatures of my base class without recompiling any derived classes."  That’s just crazy. Even by adding an override, you are changing the exposed surface of the class.

    As I understand it, you don’t _have_ to recompile Bravo.dll every time you make a change, but only  if you add a method or change the signature of any exposed method.  That sounds reasonable to me.

    If you change a _method_ signature, you have to recompile all code that references it, yes?  Likewise, if you change a class signature, you have to recompile all the classes that derive from it.  How can you safely derive from a class if you don’t even know what methods it exposes?  Besides, there are other ways to accomplish this without all the fuss.

  103. Random832 says:

    nikov said: I wonder, what behavior the customer would expect, if Bravo were modified as follows:

    The () method is still the best match. You would need a hidebyname method (e.g. if it were written in Visual Basic with the Shadows keyword) for this to even change if you _did_ recompile Charlie. Even if you’d picked a better example (say Alpha had the params object[]) still the difference that it is overriding the _same_ method, rather than being a different method with the same name or same signature. You’d no more expect to pick it up without recompiling than if it were another method added to Alpha.

    Incidentally – Intellisense erroneously shows hidden methods as overloads in a hidebyname situation (even though C# itself prevents access to the hidden methods without a cast)

    Gavin Greig said: I don’t think those concerns should override common sense; by default, compiled code should not change its behaviour when a change like this occurs – when completely unknown code is inserted where no method existed before. That should only occur after the new method’s been explicitly approved through recompilation (and retesting, etc.).

    Except it is not "no method existed before" – and, taking your argument to its logical conclusion, a new DLL should only be usable _at all_ by recompiling. What’s so special about a new override (not a new method or even a new overload) that requires being "explicitly approved through recompilation (and retesting, etc)" when other code changes in the library, such as changing an "existing" override [whose body was a single base call] to do something else, or even an added override in Bravo being called _on an instance of Bravo_, do not? You haven’t defined "a change like this" clearly in a way that allows your interpretation to be ‘common sense’ and that means that the existing behavior in all [any!] other cases follows ‘common sense’.

    BW said: I’m impressed. Sounds like a sensible way to avoid (the .net version of) DLL hell.

    Sounds more like a way to create it, if Bravo.M2 [which exists in both versions] is modified in version 2 to depend on Bravo.M having been called, those changes _will_ silently be called _without_ Bravo.M having been called. If you’re going to say as a philosophical point that any change to a base class ought to require recompiling any code that derives from it, then you need to make sure that _any change to a base class_ will make classes derived from previous versions fail to run, rather than merely silently leaving out _some_ (but obv. not all) of the changes

  104. GRico says:

    I’m sorry to say that the C# team is wrong wrong wrong on this one. I’ve never run into this issue and I don’t think too many have since it hasn’t cropped up all that much but nonetheless, its IMHO a pretty big problem that should be fixed ASAP.

    I’m not really go into any technicalities but my reasoning as to why the current behaviour is so wrong goes as follows:

    Imagine the original Bravo.dll implementation were something like this:

    class Bravo: Alpha

    {

       protected override bool DoM() {

       …

       return whatever

       }

    }

    Charlie implementation is:

    class Charlie: Bravo

    {

        protected override M()

        {

              if (base.DoM())

                 base.M();

         }

    }

    When I see this code I understand (at least up to now) base as the immediately less derived class in the inheritance chain. In this case that would be Bravo. I don’t really care if the implementation of the methods I’m calling is in Bravo or in Alpha, i’m still calling THROUGH Bravo because thats what intuitively base means to me.

    If suddenly we change the Bravo implementation and add an override of M() what happens with the code in Charlie? Well, we have the really bad situation of having base mean two entirely different things in the same declaration body. That is base.DoM() really means base.DoM() but base.M() actually means base.base.M().

    I’m sorry but that is plain wrong and it goes against all that Eric has written in previous posts about the design philosophy of C#.

  105. GRico says:

    I should add that base in the current implementation is always meaning two different things. But when we change the implementation in Bravo its when the "two meaning" ambiguity becomes obvious.

    I can find thousands of places in this blog where Eric has stated that one of the c# design principles is that something cant have two meanings in the same declaration scope. Sadly it seems to not be the case with the base keyword. I just found out thanks to this blog though.

  106. Chris B says:

    This is clearly a thorny issue with strong arguments for both sides.  Even in light of good arguments, I still feel that recompiling when one of your dependencies changes is the safest thing to do.  If the objects in question were interface implementations, I think everyone would agree that it is not sensible to hot-swap a new interface definition in that had a added methods since the previous version.  The key difference from that case is that with a virtual call there is a fall back candidate.  The CLR’s behavior of falling back to the next candidate in the virtual call chain is as worrisome to me as the compiler’s current behavior.  I would prefer the CLR to validate that the runtime method chain is consistent with the compile-time method chain, and error out if not.  So in this, case, the CLR could detect that in C.M(), base.M() resolves to Alpha.M(), bypassing Bravo.M() which is almost certainly not intended.

    IMHO, binary compatible changes should be limited to tightly encapsulated implementation details. Protected methods are clearly visible on the outside, and that makes them part of an object’s contract.  When the contract changes, beliefs which were previously true cannot be assumed to still be so.

  107. Bill Sheldon says:

    I wouldn’t expect this to change.  To me in addition to the logic above, this seems like an intuitive part of static or pre-compiled code versus a dynamic language.  In a dynmic environment I can swap out B and have different results.  In a static compilation (even if I’ve only compiled to a interim language) I can’t change source code or swap out a portion of the compiled stack.

    If you prefer the behavior associated with being able to swap out a portion of the class hierarchy, use a dynamic language.  If you expect the static class inheritance behavior use a static language.  Some static languages will offer a dynamic option (C# 4.0 being one such) and within that dynamic or interpretied area I expect the behavior to support changes to the class definition after compilation.

    The advantage of compilation are a static environment and speedier execution, while an interpretive or dynamic environment allows for greater flexibiity with the according difficulty in testing all of the alternatives.  This is part of the evaluation of which language to use when creating a solution.

  108. GRico says:

    @Bill Sheldon

    I dont agree as the current state of affairs does in fact let you swap as you say with no problems whatsoever. Another issue is if this should be done without recompiling Charlie.

    If the original implementation of Bravo.dll did have an override of M and I decided to change the implementation of M later on and issue a new version of Bravo, Charlie would consume the new version fine. Its the fact that sometimes Charlie WONT notice the changes done in Bravo what is unexpected and IMHO not right.

    These seemingly equivalent implementations of Bravo turn out to be quite different and that is disconcerting to say the least.

    class Bravo {}

    class Bravo {protected override void M() {}}

  109. Gregory says:

    This has been stated before, I just want to emphasize this once more. How can we be sure that UI libraries built on top WindowsForms or WebForms or WPF (ones with deep inheritance hierarchy) compiled under .net 3.5 still be running normally in .net 4?

    It’s quite possible that in some control new override apperared in 4.0, and developer who put it there suggested that this is backwards compatible change, that do not need documentation.

  110. Tobias Berg says:

    Some commenters think that you should always recompile all applications whenever you swap in a new dll, you seem to be missing the point. The point is that you *can* swap in a new library dll and most people thought that there were only two possibilities when you did that:

    1. The library contained no breaking changes and the application would still work, running the code in the new version of the library or

    2. The library did contain breaking changes, there would be a runtime error and you would be forced to recompile and perhaps change some code that depended on the library.

    Now we see that there is a third possibility, namely that that the library can contain changes that don’t cause a runtime error but will silently fail in running the new code.

    If adding new method overrides is a breaking change, then it should cause a runtime error swapping in dll’s containing new method overrides. This would probably break a great number of applications all over the world…

    If adding new method overrides is *not* a breaking change then those new overrides should not be bypassed and the base call should work the way most intuitively think it does. I cannot see any way this can break existing applications, the only real counter-argument is that this means work for the compiler team (and maybe for the CLR team too) and since they don’t have unlimited time they have to prioritize and there may be other things that take precedence.

  111. Geert Baeyaert says:

    We shouldn’t forget that object declares virtual methods as well.

    public class DerivedClass: BaseClass

     {

       private readonly int _i;

       public Gamma(int i)

       {

         _i = i;

       }

       public override bool Equals(object obj)

       {

         return _i == ((DerivedClass)obj)._i && base.Equals(obj);

       }

       public override int GetHashCode()

       {

         return base.GetHashCode() ^ _i;

       }

     }

    What if BaseClass changes, and overrides Equals?  DerivedClass would ignore this change, and call object.Equals() instead.

    This is not what I expected at all.

  112. DLL says:

    > I wouldn’t expect this to change. To me in addition to the logic above, this seems like an intuitive part of static or pre-compiled code versus a dynamic language.

    Well, doesn’t DLL mean "dynamic link library"? 🙂 Maybe it’s time to consider a different file extension for assemblies.

  113. Morten Mertner says:

    I’d like to join the majority of the commenters and endorse the customers point of view.

    The examples from Tobias Berg and Steve Bjorg really show that there are valid scenarios for expecting a completely runtime based resolution of method calls. Being forced to add dummy overrides in your assemblies just to ensure correct future behavior seems to negate the purpose of having virtual calls in the first place, and certainly renders the "compile-time resolution is safer" argument invalid.

  114. Danyel says:

    np: the fundamental question here , I think, is whether the user IS changing the signature or not. I don’t think they are–but that’s because I had assumed that every method that I didn’t implement really did have that little magical implicit "send it on up the chain":

    class Bravo2 : Alpha { public override void M() { base.M() } }

    This is how a LOT of things work, I think. Routed events will route through everything in the middle, whether they declare a handler or not, and if I add a new event handler half-way up the chain on the fly, my code accomodates it.

    So, +1 for the customer.

  115. Jed says:

    I fully agree with the customer.

    The above arguments are not convincing.

    I also agree with Sandro’s comment. Why would you allow virtual resolution to a private method? Is there any practical or even technical value for this? As you said, "private methods are invisible implementation details; they do not have any effect on the surrounding code that cannot see them". Fix this at the CLR level and then you are free to generate the appropriate code that nearly all of us are expecting.

  116. Mario Cossi says:

    @Tobias Berg:

    Your scenario is indeed compelling, but it is unfortunately far from reality:

    1) there could be breaking changes in Bravo.dll that do not cause Charlie.dll to fail at all. A trivial case in which this is true is when Charlie isn’t using all the features of Bravo, so the modified code is either never called, or is called with parameters that will cause the same results as before to be generated.

    2) there could be breaking changes in Bravo that cause Charlie to generate invalid results without crashing. I proposed the most blatant one (a change in the values of an enumeration or a constant), but there are many other cases in which this is true.

    3) a non breaking change in Bravo could still cause Charlie to fail, with or without a run-time error. This is common whenever the author of Charlie relied on bugs or undocumented features of Bravo (just to make a common example, Charlie might rely on the order in which events are fired by Bravo).

    As you see, the concept of "breaking change" is an elusive one… the only definition that makes sense to me is that a change in Bravo is breaking (with respect to Charlie) if it causes Charlie’s test suite to fail. But since Charlie’s test suite isn’t going to be around when you swap in the new Bravo, re-testing is mandatory.

    Which boils down to Eric’s original statement.

  117. Michael Starberg says:

    *decloaking, delurking*

    This must be one of Erics best ever.

    Partly for his original post but mostly for all the comments. Awesome.

    Eric updated his article and said there are two different mindsets.

    Still he gets flac for being ‘customer’ vs. ‘Eric’.

    I still find it amusing that people thinks Eric is the ‘supreme leader of .NET’ just because he blog about C#.

    And by that chord; while prof. Barbara Liskov had a lot to say about OO, she does not hold a badge or have any mandate on how to implement OO. Nor do C++, and Java; nor Pascal etc etc Ad nauseam.

    Me myself and I, was totally amazed by all the comments voting for the ‘customers’ view.

    That assemblies are not easy to hotswap we all must understand.

    It is even on the exam if you wan’t to get certified on .NET and C#.

    What is the difference between a ‘const’ and a ‘static readonly’?

    Your answer can earn you +1 or epic fail on the exam.

    How assemblies works is mandatory knowledge if you want Microsoft to call you a pro.

    But I had no idea that assemblies was so hooked at compile-time that a hotswapped .dll might even hurt ‘normal’ calling conventions on base.

    Guess you learn something new everyday.

    Let’s take this down to a level that does not involve ‘customer’ and ‘eric’.

    Let’s keep it simple. I’m with Robert Davis on this one: – Recompilation is cheap!

    If you hotswap a .dll without any tests or even a compile, you are back to dll-hell. There is no pity for you.

    I would never ever ‘inject’ a new .dll without compiling. That’s just wrong.

    Summary:

    If you ever get burned on this ‘static behavior’ you should seriously consider your deployment strata.

    Fast poll:

    Who ever got burned by this?

    1. me

    2. not me

    3. won’t tell

    4. don’t know.

    My guess is that the ‘customer’ voters will vote 4. I wanna be in category 2.

    Please don’t ‘fix’ what is not broken.

  118. Gabe says:

    I just can’t believe that the CLR designers really expected us to recompile our apps every time a DLL is revised.

    Imagine that you’re a vendor with a graphical control library that depends on WPF. Every time a new service pack is released for WPF, you need to recompile your library, right? Only it doesn’t matter that you have to recompile because you need to get your new DLLs to your clients. But your clients aren’t the end-user; the clients of your custom control library are app authors. This means that your clients’ apps will all start misbehaving when *their* customers download a service pack.

    So every time there’s a new revision of WPF, you have to compile a new set of DLLs, distribute them to your clients (assuming you even know who they are), and then they have to distribute the new DLLs to their customers (assuming they even know who they are).

    But wait, there’s more! Your clients don’t know which revision of WPF will be on their customers’ machines so they have to ship an install package with a different set of DLLs for each possible WPF revision. That way the right set can be installed at setup time, and the correct version can be swapped in if the app detects that the user has installed a WPF service pack.

    How is this different from any other change in a service pack that breaks your DLL? Almost any other kind of breaking change is something you can detect by looking at the version and adjusting your behavior accordingly. Changing a base class’s overrides, though, requires a separate version for each revision of the base class DLL because while each revision could add overrides, it could also remove them. This means that there is no single version of your DLL that can work with all revisions of the base classes.

    I don’t know about you, but I would prefer a system that *didn’t* require what I just described.

  119. DRBlaise says:

    I would also like to put in a WOW!  I imagine that the debate within Microsoft was even more passionate than the responses in these posts.  Especially since, "The team all agreed that the desirable behaviour was to always dynamically bind to the closest base class"

    It had to be very frustrating for the C# team to not be able to create the functionality that they thought was best, and then to actually have to make the situation worse for their users between version 1.0 and 2.0 so that the C# compiler and other tools would be simpler and faster.  I bet this was a bitter pill to swallow for some of the team.

    Thanks for sharing some of these intimate details of the decision process of the creation and evolution of C#.

  120. Michael Starberg says:

    Gabe: It could be IUnknown, IDispatch and mismatching guids as a dll-hell saga. That’s fun, if you have the time for it. regsrv32 was oldschool when it was named COM+. Now you have to drag-n-drop your COM+. That is just on the plus side.  =)

    Maybe above was a red herring and a moot point, but:

    As far I can tell, in the real world you sign your ‘dynamic link libriaries’ and go by public key token in your .config and tell what is supported and what is required (which are almost opposites compared to daily speech). If you do it right the ‘new bravo’ wouldn’t even be loaded just because you put a file in some folder.

    I have such a hard time figuring out when this would ever be an issue?

    For real, if the bravo-dude codes his bravo.dll and then later figures he should ‘just add some code that might break stuff;’ there is something seriously wrong with the bravo package. Maybe we shouldn’t have used bravo in the first place? If bravo was commercial and expensive, can we haz our monziez back nao?

    If you are giving me a bravo2.dll. Fine. If you give me bravo.dll and do a .M2() or even .M17(), you’ve been doing too much DirectX and need therapy. By the idiom of that you never break an interface/contract. But if you give me a bravo.dll, that looks and smells like a bravo.dll and you ‘forget’ to tell consumers that your new override of .M() has breaking changes in the oo-hierarchy, then I am all of the sudden all for capital punishment.

    I’m not that smart, so I would like for dudes like Eric, and some java-dudes, to analyze what the ‘cost’ of doing runtime check to make academia in OO happy. .NET already starts slow, especially asp.net. I have no idea, but how much slower would a runtime check take?

    The proper solution as far as I can tell, already provided above. For bravo-team: is do just do hollow calls to base for all virtrual (virgin) calls. I’d put that into the ‘refactoring’ bucket, and if Visual Studio 2015 won’t do it, maybe re-sharper will sport such a feature for the few people that needs it.

    Now, I just love to be proven wrong. I usually are. But please, pretty please, give me a real world example when this would ever be an issue?

  121. DRBlaise says:

    @Michael Starberg – Did you actual read Eric’s original post? Eric actually gives a real world example:  "Here’s a crazy-seeming but honest-to-goodness real customer scenario that got reported to me recently."

    It really is not very hard to imagine real world examples.  Try using your imagination.

  122. Michael Starberg says:

    DRBlaise: Did you read all the comments? Also, I hardly think there is a bravo.dll. I _am_ trying to use my imagination. Even tried by closing my eyes. Still zero. I see nothing. Do you? Please help me understand why you would hotswap a .dll in the first place without at least do the ctrl-shift-b on your local laptop before you even commit to the repos?

    But let’s say there was a true case. Eric says there is at least 1. Hence at least an array. How many are bothered by this? 7 people in the world? 130 people? Would 16-bit be enough to hold them? Should I go 128-bit just to care for future people who complains? Even so, I say it is their pain.

    I will hold my ground and stand by my position: If you put any file that can be run on disk and then open it without knowing what you are doing, without any documentation, then you might get surprises.

    Guess I just don’t like the idea of hot swapping .DLL’s all willy nilly. Been there, done that. Enough already!

    Back to the ‘const’ vs. ‘static readonly’ question. Miss that one, and you will flunk the entire test. That is how Microsoft sift out see-sharper from java-dudes..  Just kidding. Or, am I? I don’t know as I passed. =)

    Fair to say is that Eric really tries to explain how stuff works.

    Can we at least agree on that that is a virtue and makes us all happy?

  123. Michael Starberg says:

    oh, I think the clue is the title: – Putting a base in the middle. =)

  124. commongenius says:

    @Michael,

    Not sure why this is so difficult, but here goes:

    1. You distribute a .NET 3.5 application to your customers.

    2. Customers install .NET 3.5 SP1.

    3. Hosed

  125. Michael Starberg says:

    genious: So you are placing/voting yourself in category 1?

    What I found as of yet, is that when you go .NET4 with code contracts and then have to revert to .NET3.5 in panic, msbuild.exe gets very upset and refuses to build. Just a few lines in the .csproj-xml that makes the build confused.

    And so I have to manually compile and deploy .dll via ftp until when .NET4.0 sails. Hence I wrote I wanna be voting 2, but am factually in cat 1.

    Well not really, as this is just a few lines in xml that could be done with. I’ll stick around doing it manual until 12 April.

    Would you still be ‘Hosed’ if you recompiled, sift through all warnings and set a ‘supported to 3.5 sp1’ that would make windows update kick in, assuming we are talking windows OS? I am not nagging, but asking.

  126. Michael Starberg says:

    genious: Not begging the question correctly, and this is getting off topic, but doing your 1. 2. 3. and getting ‘Hosed’, isn’t that a good thing? Of course not. Especially if it is transactional data or Jack Bauer remoting a nuclear power plant. But surely a SP1 is better than vanilla and worse than SP2?

    Your users can’t plan for a hypothetical upgrade of the OS, .NET run-time nor its frameworks and BCL. But you can plan for it. Or wait, your entire point is that you can’t plan ahead. I stand corrected. Luckily, thanks to this post; now we know we shouldn’t leave ‘virgins’ empty in C#. I kinda start to like that keyword. If C++ has friends, why can’t C# have virgins?

    But no. I am trying to say that I would not like to get a performance hit just to solve the anti-pattern of ‘base in the middle’. I don’t know what that performance hit would imply, but for me it sounds like swatting flies with a bulldozer.

    Hehe, I don’t think Liskov is following this thread, but if she would have, she’d probably be hiring assassins. Oops. Eric can always hide behind ‘design notes’, but I am taking a true stand for ‘NO FIX’.

    It’s not a bug. Working as designed.

    One good thing has come out of this thou. I bet you all a atto-dollar that the C# crew is etcha-sketching on private overrides for nested classes. If not virgins, nor friends, what do you call those? poked?  Whoops.

    Seriously

    – Michael Starberg

  127. DRBlaise says:

    Eric, I agree with your latest update, that some of the comments have been a little dramatic and emotional, but I have to say that your latest update seems somewhat defensive and emotional to me.  You have not added any additional information and have basically re-stated your points and added some BOLD typing.  It reminds me of a SNL skit about Pres. Bush:  “… we are working HARD for the American people … working late … working weekends …”

    In my opinion, it is a disingenuous to compare the C# 1.0 “new private method” bug with the C# 2.0 “new override” bug.  They may both be rare, but when comparing them to each other, I would say the C# 2.0 “new override” bug has to be at least 100 times more likely to happen, have more serious and subtle consequences, and is harder to debug.  (I understand that you and a few others do not believe the “new override” behavior is a bug, but I strongly and emotionally disagree! :))

    I strongly disagree that I am disingenuous. I strongly disagree that any members of the design team are disingenuous. I also strongly disagree with any implication that we do not make every decision with the needs of the customer in mind. You might not always agree with those decisions (and neither do I). And indeed, not every decision we make turns out for the best; we are imperfect humans working with imperfect information, like everyone else.

    As for the likelihood of the change producing an error: the whole motivation for this article in the first place was that this issue was first reported to me a couple of months ago, and then again a couple of weeks ago. Neither were bug reports describing an actual user-affecting problem; both were “I happened to notice that this was the compiler behaviour, is it by design?” Of course, not every question from every user gets run by me, but the fact that I’ve seen this question a grand total of twice in the last five years tells me that this is not exactly cutting a swath of destruction through the industry. It’s a small unexpected “gotcha” behaviour among many thousands of other such small unexpected behaviours, and I don’t see what the big deal is. Compare it, say, to “lambdas close over the variable of a loop, not its values”, which I get a couple of times a week; that one really is hundreds of times more likely, and we’re probably not fixing it either.

    My point is, and continues to be, that when you make a change to a base class, you’ve got to test your derived classes, and possibly recompile them. I don’t see this as controversial, surprising, or onerous.

    — Eric

    From the developer notes you shared, after the failure to get the CLR team to make the necessary change to implement the “desired” functionality, the C# team’s main reasons to make the change was to simplify and speed up the compiler and other tools.  I can understand these reasons, and appreciate you sharing them with us.

  128. DRBlaise says:

    Eric, thank you for your response and your honesty – I appreciate your passion to defend yourself and your team.  I am sorry that I implied that you or your team are disingenuous in anyway.   I should have wrote:  “It is my opinion you are fooling yourself if you think the two bugs can be compared equally.”  But I wanted to expand my vocabulary after your use of  “rather histrionic comments.”  (I had to look up the definition)

    Thanks again for this blog and the great insights you provide on a weekly basis.  Please continue to provide the unique looks into the C# design team’s decision process.  I do know that your decisions are made with us in mind and that a lot of decisions are difficult compromises.

  129. Michael Starberg says:

    Eric: Thanks for not being ‘disingenuous’ . I don’t even see the gotcha.  It is working as designed. To boldly go where no language gone before.

    Maybe it would just be what Monty Python did with the ‘smart penguin 2 meters tall’ sketch with John Cleese narrating ‘This was later known as the complete-waste-of-time-theory’ but my curious mind would still like to get some idea of what a ‘fix’ would imply.

    I am sorry to say that I don’t have the time in my life, nor the skillz,  to write my own compiler and runtime and test what a ‘true virtual’ call would cost. And I am not asking for you to mess with the compiler. Seems like java does it. But you could use your brains and insights and maybe do a new post on what a change would take.

    While I rather see the C#-team spend time/brains on private overrides for nested classes. Now THAT is not a complete waste of time theory.

  130. Michael Starberg says:

    DRBlaise: Well, I am one that do appreciate *your* passion. Why can’t coding and compilers be an emotional topic? I’ve read every word you wrote and actually learned something. However, it is safer to do Star Trek and Python jokes than to compare what you think is a ‘flaw’ with George Bush. That is taking it way off-topic.

    I have also learned something that I have seen before. If you want to get Mr. Lipperts attention, all you have to do is to insult his brain. For me this is still a untested theorem, but the day I really want his attention I am so gonna play the disingenuous card. By the sample of one, it seems to work =)

    Happy Easter

  131. ficedula says:

    "My point is, and continues to be, that when you make a change to a base class, you’ve got to test your derived classes, and possibly recompile them. I don’t see this as controversial, surprising, or onerous." –Eric

    Having come in late to the discussion, I think the problem is that if you restate the situation – as a number of commentators have! – it’s more obviously problematic;

    "When you – or somebody else like Microsoft or another upstream library vendor – makes changes to a base class, you’ve got to test your derived classes – although you may not know what you’re even testing *for*! – and possibly recompile them. And do this before any of your users in-the-field install, say, a .NET service pack containing the new base classes."

    If a .NET SP comes out that adds some new overrides (as has been indicated, this does happen!) your classes will now not be calling those overrides. The fact that, say, your call to base.PreInit() "bypasses" the new override may – in the worst case – open a security hole. How are you meant to test for that? You don’t have full details of every change MS has made and what the possible ramifications of "bypassing" a new override are! I guess to be safe, you have to assume that bypassing the override may be problematic, and *always* recompile and release a new version of your software.

    So it sounds like the only "correct" way to deal with the situation is, whenever a .NET SP comes out, email all the users of your software and tell them under no circumstances to install it until you’ve had a chance to recompile all your software and release updated versions compiled against the new libraries. God help them if another vendor orders them to install the SP straight away because it contains security fixes…

    (Luckily the fact that Eric’s not encountered many problems like this implies that while there are pieces of software out there ‘bypassing’ the new overrides in some situations, none of them have caused any security or other serious issues. That we know of. Of course, maybe Microsoft is particularly careful to make sure no new overrides added in service packs will cause any problems if they’re ‘bypassed’ but this seems unlikely – correct me if I’m wrong!)

    @Michael: Doesn’t Eric touch on the required changes in the article itself – basically, the CLR already supports exactly the required behaviour so it would "just" involve changing the CLR opcodes output by the compiler for a base method call – no runtime changes required?

  132. Random832 says:

    @Michael Starberg, So do you recompile separate versions of your unmanaged applications for each version of windows? (new version of kernel32.dll!)

    The simple fact is, there are breaking changes and non-breaking changes. Adding an override to a virtual method ought to be a non-breaking change (assuming the new code is specifically intended to work in the situations the old code is called in, which is rather the point of virtual methods now isn’t it) just as modifying the body of an existing method is. In any other situation than a base.M() call from a derived class, it _is_ a non-breaking change. You seem to take the extremist attitude that _every_ change should be considered a breaking change: "If you hotswap a .dll without any tests or even a compile, you are back to dll-hell. There is no pity for you."

    As for the real issue… The "base.x" syntax conceals the fact that you are making a non-virtual call to a class other than the actual base class. If C# is truly meant to work this way, the syntax should be something like Alpha::M, with Bravo::M being a _compile-time error_ when there is no override. This would incidentally allow you to continue calling Alpha::M (and skipping Bravo::M) after the change is made, until you not only recompile but also edit your source.

  133. Focus says:

    I agree with what GRico said somewhere above.

    base can actually have two very different meanings in the same declaration space. It might be intended that way but it doesnt feel right at all.

    protected override Whatever()

    {

         if (base.whatever())

                base.whateverElse();

    }

    if whatever() is overriden in the "base" class understanding base class as the preceding class in the inheritance hierarchy and whateverElse() isnt, then the base keyword has unexpectedly two alltogether different meanings.

    I wasn’t aware of this behavior at all. To me base.X intuitively was a virtual call, so all this is new. I still think that C# made a mistake here as my intuition seems to coincide with the majority of posters and probably with the majority of coders who aren’t even aware of this issue at all.

    I’d like to point out that this is completely independant of the "dll swap" problem this post talks about and if you should recompile or not. That issue only makes the dual meaning of the base keyword visible which is my gripe with the language.

  134. Random User 423680 says:

    Unless I misunderstand (which would not surprise me, since I am reading fast), the current design only breaks if a class in the middle of an inheritance chain changes its _implemented_ interface. Unfortunately, this is disguised by the virtual mechanism so that the _visible_ interface is unchanged.

    Regardless, changing the interface (even invisibly) is by definition a potentially breaking change. The party responsible for Bravo should adjust version numbers, strong names, etc. as appropriate to indicate that the new release is not hot-swap backward compatible in the case of an added interface element, and not hot-swap compatible at all in the case of a removed interface element.

    (For the purposes of this comment, I am trying to interpret the existing design. At the moment, I have no opinion on what is "right".)