Why Do Initializers Run In The Opposite Order As Constructors? Part Two


As you might have figured out, the answer to last week’s puzzle is “if the constructors and initializers run in their actual order then an initialized readonly field of reference type is guaranteed to be non null in any possible call. That guarantee cannot be met if the initializers run in the expected order.”

Suppose counterfactually that initializers ran in the expected order, that is, derived class initializers run after the base class constructor body. Consider the following pathological cases:

class Base
{
    public static ThreadsafeCollection t = new ThreadsafeCollection();
    public Base()
    {
        Console.WriteLine(“Base constructor”);
        if (this is Derived) (this as Derived).DoIt();
        // would deref null if we are constructing an instance of Derived
        Blah();
        // would deref null if we are constructing an instance of MoreDerived
        t.Add(this);
        // would deref null if another thread calls Base.t.GetLatest().Blah();
        // before derived constructor runs
    }
    public virtual void Blah() { }
}
class Derived : Base
{
    readonly Foo derivedFoo = new Foo(“Derived initializer”);
    public DoIt()
    {
        derivedFoo.Bar();
    }
}
class MoreDerived : Derived
{
    public override void Blah() { DoIt(); }
}

Calling methods on derived types from constructors is dirty pool, but it is not illegal. And stuffing not-quite-constructed objects into global state is risky, but not illegal. I’m not recommending that you do any of these things — please, do not, for the good of us all. I’m saying that it would be really nice if we could give you an ironclad guarantee that an initialized readonly field is always observed in its initialized state, and we cannot make that guarantee unless we run all the initializers first, and then all of the constructor bodies.

Note that of course, if you initialize your readonly fields in the constructor, then all bets are off. We make no guarantees as to the fields not being accessed before the constructor bodies run.

Next time on FAIC: how to get a question not answered.

 

Comments (42)

  1. Richard says:

    Carrying on from the previous article… I think the C++ model of (as you put it) "objects that mutate their own runtime type" is appropriate. The constructor is what makes an object of a type. Until the constructor is run, the type’s invariants aren’t met (a type theorist would say that it’s not yet of that type). Once the destructor has run, the invariant is once again not met (it’s no longer of the fully-derived type). This leads to some surprises (pure virtual function calls being the obvious ones).

    On the other hand, to me, the CLR model is deeply weird. As I understand things, even before my constructor runs, my member functions can be called, and my member variables can be read or even changed. Any hope of a well-defined notion of a class invariant is lost. Now, you could argue that the problem is the same in both cases — that essentially, trying to treat a not-yet-constructed object as its derived type before the derived constructor is run is simply an error — but I would disagree. In C++, you can’t get into trouble without explicitly (static_)casting to the derived class, but in C#, you can get into trouble if you call a virtual function from the base class’s constructor.

  2. EricLippert says:

    Consider this code:

    abstract class B { public B() { if (this is D) Foo.Prop = this as D; } }

    class D : B { }

    You’re telling me that when B’s constructor runs, "this is D" should return false, but once control leaves B, suddenly it starts being true?

    And you’re telling me that it should be possible for an object to report that it’s current runtime type is abstract?

    That’s weird, man. :-)

    The constructor doesn’t make an object of a particular type. The allocation of the object makes it of a particular type. The constructor is just a method that runs on the object.

  3. Alex Simkin says:

    To Eric: If constructor is just a method, why bother to have constructors at all?

  4. Motti Lanzkron says:

    I strongly agree with Richard, I don’t see why Java and .NET took a different design decision than C++. I’ll admit that for a novice the C++ behaviour is strange at first glance but looking deeper it makes much more sense. Not being a "type theorist" I still think that an object is not of type X until the X constructor has finished running.

    Proof to the fact that the .NET model isn’t consistent is that a hack in initializers’ behaviour plug a hole that regular constructors do not. As Eric said, all bets are off.

  5. Alex Simkin says:

    Bill Wagner published an article on the subject in Visual Studio Magazine in December, 2007

    http://visualstudiomagazine.com/columns/article.aspx?editorialsid=2377

  6. EricLippert says:

    > To Eric: If constructor is just a method, why bother to have constructors at all?

    An excellent question. There are languages which have no constructors — you want to run code to initialize an object, you go right ahead and run that code.

    The reason we have constructors is because the "run a particular method exactly once when an object is created but never again" is a very common pattern, so common that the designers of several languages have deemed it worthy of inclusion in and enforcement by the language and runtime.

  7. Alex Simkin says:

    To Eric: Then why not to restrict calls from the constructor to base(), this() and static helpers?

  8. EricLippert says:

    Because then you end up with duplicated code. Consider a mutable object which represents an enumerator over some sort of collection. A common pattern is

    class C {

     public C() { Reset(); }

     public void Reset() { …  }

    }

    With your way, either you have to force the user to call Reset() after construction, which produces an opportunity for a bug, or you duplicate the code in Reset(), which is an opportunity for maintenance problems.

  9. Alexei Lebedev says:

    To Eric: You’re buying the ability to never see a NULL readonly member at the price of allowing to call derived class members before the derived class constructor. This means the derived class cannot implement an invariant that would hold throughout its lifetime (post-constructor and pre-destructor).

  10. EricLippert says:

    I’m not following you. How does the order of running intializers before constructors make it impossible to implement an invariant?

  11. Alex Simkin says:

    To Eric: There shall be no way to call instance methods before instance is constructed, so that post constructor will be the first one called after instance construction.

  12. Alexei Lebedev says:

    Well, the other possibility was to have "A is B" return false (because the object is not B yet) and thus disallow downcasting to a not-yet-constructed type.

    the way you have it, the invariant in  

    class A {

    A() { Invariant(); }

    public virtual void Invariant() {}

    };

    class B : public A {

    private int i;

    public B(int i_) { i=i_; }

    public void Invariant() { assert(i==1);}

    }

    does not hold if A calls Invariant(). In fact, strictly speaking it’s accessing an uninitialized variable. It’s just that the runtime system went and zeroed out all the fields, so they seem initialized.

  13. EricLippert says:

    Correct. That’s why its a bad programming practice to call virtual methods from constructors.

    The reason it is a bad idea to call virtual methods from constructors is because a method on a derived class might run before the derived class constructor runs.

    That is, the tradeoff made is we are trading the benefit of "an object is always of one type throughout its entire lifetime" against the benefit of "it is always safe to call a virtual method, even from a constructor".

    What I am confused about is your statement that this restriction on when you should call virtual methods is a _consequence_ of the fact that initializers run before constructors. That is saying it backwards. Rather, both the fact that you should not call virtual methods in constructors, AND the fact that initializers run before constructors, are _consequences_ of the fact that object type is not mutable.

    Does that make sense?

  14. apenwarr says:

    Okay, from this discussion, I can see why it makes sense to initialize class member variables of both Base and Derived before running any constructors; despite the assumption from the previous article, that behaviour didn’t surprise me at all.  Doing things in between calling the Base and Derived constructors would be more surprising to me.

    What I don’t understand is why it’s necessary to do Derived’s members before Base’s.  According to my experiments, member initializers can’t refer to "this" anyhow, so nobody can (accidentally) get any reference to any of either Base’s or Derived’s member initializers during the initialization phase.

    If that’s true, there seems to be no obvious reason to run Derived’s first; it’s irrelevant either way.  It will probably never affect me, but just for symmetry with constructors, running Base’s initializers before Derived’s seems to make more sense.

  15. EricLippert says:

    The implementation of the "reverse order" semantics is simple — have the constructor run the initializers, then call the base class constructor, then run the constructor body.

    Suppose you wanted the base class initializers to run before the derived class initializers. Imagine you are the compiler developer; how would you implement it?

  16. Alex Simkin says:

    To Eric: Have "hidden" initializer that runs base initializer then runs itself then you run constructor which calls base constructor.

    By the way. Do you know why VB.NET specifies "Java style" initialization in language definition? Did they have any reasons or did it just because VB programmers (myself included) are supid and won’t know better?

  17. EricLippert says:

    > Have "hidden" initializer that runs base initializer then runs itself then you run constructor which calls base constructor.

    And then how does the base constructor know that it doesn’t have to run its hidden initializer?  It had better not run it again, otherwise we’ve just initialized all the base stuff twice.

    > Do you know why VB.NET specifies "Java style" initialization in language definition?

    Nope. I have not attended VB design team meetings since 2001, and even then I was only there as an expert on the differences between VB and VBScript. I have no idea why they made the specific design choices that they did; you should ask a VB expert. Paul Vick, say.

  18. Alex Simkin says:

    Constructor doesn’t invoke initializer. Initializers invoke initializers, constructors invoke constructors.

    I asked Paul Vick, his principles well known and published: Working in a natural way is a higher priority than language purity.

  19. Ollie Riches says:

    It’s like baking a cake!

    You source and buy the ingredients (initialising) before mixing & baking (constructing) and once it has finished baking you can use it for what ever purpose you intended – usually eating…

    It wouldn’t make sense so start mixing before sourcing the ingreidents, you would getting half way through and realise you need to go to the shop for baking powder…

    Or may be it’s just me who sees it like baking a cake…

  20. David Moles says:

    <i>Working in a natural way is a higher priority than language purity.</i>

    I agree — but it’s not obvious to me which is more unnatural: running all initializers before all constructors, or having supposedly immutable fields take on different values during initialization/construction. I have to say after many years of Java I find the C# approach rather attractive.

  21. Alex Simkin says:

    At least I have found why MyClass keyword was introduced in VB.NET. It allows one to call virtual functions of the class even if they are overloaded in the derived class. Thus one can simulate C++ behavior by prepending virtual function calls in constructor with MyClass.

  22. EricLippert says:

    > Constructor doesn’t invoke initializer. Initializers invoke initializers, constructors invoke constructors.

    OK, so who gets the ball rolling?  You’ve got to tell the CLR in the metadata of the assembly which method to call when an object is constructed by "new".  You cannot tell it the method that runs just the initializers, and you cannot tell it the method that runs just the constructors. What are you going to do?

    I anticipate your answer — generate a third method that runs both, and have that be the "real" constructor.

    So in short, you’re suggesting that every constructor declaration potentially create three different methods, one which implements initializers, one which implements constructor bodies, and one which calls the other two.

    This added complication would not maintain any invariant about the class, since the order of initialization makes no difference to the class itself — the whole point is that the instance is not inspectable until after the initializers run. The only difference it makes is if there is a side effect in two or more of the initializers, and you care about the order in which those side effects are effected, and you want them to go base to derived.

    I do not see "side effects are effected in a different order" as a compelling reason to massively complicate the code generator for constructors. It’s complicated enough already, believe me!

  23. Alex Simkin says:

    I got your point. Thank you for your patience with all the clarifications.

  24. EricLippert says:

    You’re welcome! Thanks for asking a good question and bearing with me through the answer. :-)

  25. Alexei Lebedev says:

    To Eric: I retract my first comment, I didn’t understand the point about C# objects "always being of one type". Everything pretty much follows from that requirement. In fact, you could even initialize derived members first, too. Would create less of a surprise for those derived virtual functions that can be called before the constructor is.

  26. Welcome to the forty-first Community Convergence. The big news this week is that we have moved Future

  27. Ben Voigt says:

    Eric said: "You’ve got to tell the CLR in the metadata of the assembly which method to call when an object is constructed by "new".  You cannot tell it the method that runs just the initializers, and you cannot tell it the method that runs just the constructors. What are you going to do?"

    Have the newobj instruction call the initializer, then the specified constructor.  Since there is only one initializer for any type there’s no need to pass it to newobj.

  28. AC says:

    People, have a look at the IL, it’s quite informative and fairly straightforward.

    .method public hidebysig specialname rtspecialname

           instance void  .ctor() cil managed

    {

     // Code size       37 (0x25)

     .maxstack  8

     .language ‘{[deleted]}’

    // Source File ‘C:DevProgram_Original.cs’

    //000027:         readonly Foo derivedFoo = new Foo("Derived initializer");

     IL_0000:  ldarg.0

     IL_0001:  ldstr      "Derived initializer"

     IL_0006:  newobj     instance void EL.Foo::.ctor(string)

     IL_000b:  stfld      class EL.Foo EL.Derived::derivedFoo

    //000028:         public Derived()

     IL_0010:  ldarg.0

     IL_0011:  call       instance void EL.Base::.ctor()

     IL_0016:  nop

    //000029:         {

     IL_0017:  nop

    //000030:             Console.WriteLine("Derived constructor");

     IL_0018:  ldstr      "Derived constructor"

     IL_001d:  call       void [mscorlib]System.Console::WriteLine(string)

     IL_0022:  nop

    //000031:         }

     IL_0023:  nop

     IL_0024:  ret

    } // end of method Derived::.ctor

    Essentially, the compiler sets the types new() method to be

    exec initializers

    call base ctor()

    exec constructor code

    The JIT’er isn’t doing anything special at runtime. This is all compiled code. All the JIT’er guarantees is that new gets called once. There’s no special method called ‘run initializers’.

    The issue I see is that initializer fields are available slightly before other fields. As in your example of calling a virtual method from within a constructor, a developer could really shoot themselves in the foot. Consider a change released code. If someone decided to move code from an initializer into the constructor body. Virtual methods in subclasses possibly now fail if they, purposely or otherwise, depend on this fragile ordering dependancy. You’re at the mercy of the base class coder, compiler writers, and colleagues who don’t understand the subtelty of what’s happening. Before reading this I hadn’t really thought about it either.

    Calling a virtual method from a constructor is bad practice, but since you can do it, people will.

    At least it’s possible to detect this in code, but very annoying to have to do. I’m not sure what I’d rather have happen, and I don’t feel that preventing virtual methods would be better. There are some cases where calling virtual methods from a constructor could be valid. (e.g. variant of the specification pattern)

    Perhaps it would be better to have the compiler use the following order

    call base ctor()

    exec initializers

    exec constructor code

    Better is only better because it’s more predictable. A virtual method in a subclass that uses a field will fail predictably if called from the constructor, instead of just sometimes. It should be the case that setting a field in an initializer versus a constructor should be completely transparent. It ought to just be syntactic sugar, but as you’ve pointed out, it’s not.

    Regards

  29. As a long time (35+ years) developer, I have been involved in this debate since C++ as introduced.

    There are definite pros and cons to both approaches, and the safest bet is to never call virtual methods from within the constructor. Unfortunately it is very difficult (even with fxCop) that a call to a non-virtual member which in it’s body calls a virtual method is flagged.

    One BIG advantage of the C++ model, applies to the development of library code. In the C++ model (Initializers run right before the opening brace of the constructor) is that the initialization behavior of a base class is 100% invariant with regards to the dervied class. this definately produces more predictable results. However it does also impose limitations

  30. Cerror says:

    In c++, if we check the type of an object which is being creating, we will get a interesting answer. "*this" is a Base type until the Base constructor finished. And "if (this is Derived)" this line will never happen in c++, because Base Class should never know about what Derived will do. If a object has different type in its life, this problem will be solved.

  31. EricLippert says:

    It is conceptually bizarre that the type of an object varies throughout its lifetime. It is also conceptually bizarre that you can end up with an instantiated object of an abstract type that is not of a more specific concrete type.  

    The cure is worse than the disease, in this case.  In C#, objects always have the type that you asked for from the moment of their creation, and you will never see an instance of an abstract type that is not also of a concrete type.  The C++ way of solving this problem is just plain weird.  I’m not aware of any other OOP language that does it that way.

  32. Duncan Smith says:

    I recently ran Analyze over a core library and found that by base constructor was assigning a value to an abstract property. ie. Calling a virtual method. By design, my derived class wanted to maintain the private field which doesn’t seem such a good idea anymore when the property is to be accessible via the base class. Refactoring meant relocating the private field to the base class, providing a vitual base implementation for the puiblic property and then overwridding this in the derived class to get the behaviour I needed.

    Analyze gave no more than a warning. Yet I was very impressed to get that much.

  33. Richard says:

    > It is conceptually bizarre that the type of an object varies throughout its lifetime.

    I don’t see that, myself. When one constructs, say, a Door, there’s a period in time when it’s just an Aperture (which for the purpose of this example is the base class of Door). During that period, if I try to travelThrough() the object, I want to do the Aperture thing, not the Door thing (the latter would fail, since the object has no handle to pull yet).

    Or, look at it another way. Let’s view a type as a set of values. Base class X is a pair of ints (a, b), such that a <= b. Derived class Y has another int c, such that a <= c <= b. Let’s suppose we construct an object of eventual type Y, passing in (3, 5, 4) as (a, b, c). First, an X base object is constructed with values (3, 5). This base object is of type X, since it’s in the set of possible Xs. But the complete object is not of type Y yet, since (in CLR) the value of (3, 5, 0) is not in the set of possible Ys. Only once Y’s constructor has run and set c such as to enforce Y’s invariant is the complete object in the set of possible Ys and as such of type Y.

    > It is also conceptually bizarre that you can end up with an instantiated object of an abstract type that is not of a more specific concrete type.

    Yes, that certainly goes against a lot of conventional teaching ("abstract means not instantiatable"). But that’s the cost of allowing constructors and destructors to enforce class invariants. Which (for C++ at least) is invaluable.

    On the other hand, I find it conceptually bizarre to allow methods on an object of type D to be called before D’s constructor is called, and before D’s members have been constructed.

  34. Swanny says:

    Thanks Eric. This was a useful refresher for me.

    Sure, C++ does things differently and I still have a lot of fondness for that language even if I get to use it less and less these days. But I’m just as happy with how Java/.Net do things. I think the important thing here is to know how the language your using works, what the limits are and what the benefits are.

    I don’t think that either method is universally right or wrong, just different.

    Eric should be commended for his persaverance with this discussion.

  35. Sean says:

    Richard brings up a potentially elegant solution. Within the body of the constructor, calls to virtual methods on the object being constructed could be treated by the CLR as non-virtual. Then again, this doesn’t solve the multi-threading issue that Eric pointed out.

  36. piers7 says:

    Eric: I think you should probably put an explicit disclaimer in the article that mentions that this scheme isn’t what VB.Net follows, something I’ve previously discovered to my cost…

    [I’ve assumed that the difference is just that VB.Net doesn’t really use field initializers, because they have looser rules on what can be used in them, and instead just roll all that stuff into the constructors. As a result the order of ‘field initializers’ effectively gets reversed from C#. But that’s a educated guess not even backed by bothering to look at the IL, so don’t hold me to it]

  37. Paul Schwartzberg says:

    I think in the C# model of class initialisation can be (at least) a two stage rocket.

    Construction I : the readonly, private  and public members

    Construction II: the constructor.

    I & II have their own precedence and relationship "rules" vis-a-vis their class and other

    classes (in their inheritance tree or outside it).

    Not meeting ‘Type Theory’ might be a plus (as indicated by others).

    Agree with what others write about calling methods on a class whose constructor

    hasnt run…  Code is not written.  It is rewritten.  And refractoring on a framework

    with such stuff  (that is usually not even commented) can be time consuming

    because its bug prone.

  38. Eugene says:

    Question: Can somebody provide a real life (non-pathological) example when all this (order of member initialization) matters?

    Opinion: 1) One must never provide a language feature that would require that much confused discussion. 2) If such highly-arguable feature has been provided one must never write code which uses that feature.

    Conclusion: never write code that depends on order of member initialization.

  39. Anders Cui says:

    我们在实现类的继承时,创建派生类的实例时,基类与派生类的实例字段都要进行实例化,他们的构造函数都需要调用,那执行的顺序是怎样的呢?一起来做做这个测试题吧。

  40. Maddin says:

    [/qoute]

    Richard said:

    Carrying on from the previous article… I think the C++ model of (as you put it) "objects that mutate their own runtime type" is appropriate. The constructor is what makes an object of a type. Until the constructor is run, the type’s invariants aren’t met (a type theorist would say that it’s not yet of that type). Once the destructor has run, the invariant is once again not met (it’s no longer of the fully-derived type). This leads to some surprises (pure virtual function calls being the obvious ones).

    Richard said:

    Carrying on from the previous article… I think the C++ model of (as you put it) "objects that mutate their own runtime type" is appropriate. The constructor is what makes an object of a type. Until the constructor is run, the type’s invariants aren’t met (a type theorist would say that it’s not yet of that type). Once the destructor has run, the invariant is once again not met (it’s no longer of the fully-derived type). This leads to some surprises (pure virtual function calls being the obvious ones).

    On the other hand, to me, the CLR model is deeply weird. As I understand things, even before my constructor runs, my member functions can be called, and my member variables can be read or even changed. Any hope of a well-defined notion of a class invariant is lost. Now, you could argue that the problem is the same in both cases — that essentially, trying to treat a not-yet-constructed object as its derived type before the derived constructor is run is simply an error — but I would disagree. In C++, you can’t get into trouble without explicitly (static_)casting to the derived class, but in C#, you can get into trouble if you call a virtual function from the base class’s constructor.

    On the other hand, to me, the CLR model is deeply weird. As I understand things, even before my constructor runs, my member functions can be called, and my member variables can be read or even changed. Any hope of a well-defined notion of a class invariant is lost. Now, you could argue that the problem is the same in both cases — that essentially, trying to treat a not-yet-constructed object as its derived type before the derived constructor is run is simply an error — but I would disagree. In C++, you can’t get into trouble without explicitly (static_)casting to the derived class, but in C#, you can get into trouble if you call a virtual function from the base class’s constructor.[/qoute]

    I think Richard meant theat .NET C# spec goes a CRTP pattern way where C++ does not. This is more intuitive, because it makes no deeper sense to treat an abstract classes (interface) like a more derived, for which .NET has the interface keyword to more distinguish between them.

    [qoute]

    Alex Simkin said:

    To Eric: If constructor is just a method, why bother to have constructors at all?/qoute]

    Hehe… ctors are methods, methods are function with a this calling convention (ecx, rcx has pointer to class data on the stack or Heap), and functions are loosely linked only at compile time (association) are seen some time later in a page mixed with our data which was separated before to have one principal of object oriented design philosophy :-9.

    Object orientation is just a way to fool an asm progger and do the same another way :-)

    Martin

  41. Bogdan M. says:

    I think the problem lies in the fact that the initializers are in a sense constructors. Basically there are 2 types of constructors, running in opposite directions. I am not sure what real bennefit bring the initializers. Don’t tell me it makes code shorter because it sounds like a bad joke. Readonly members? There are alternatives.

    You don’t have to duplicate code because you can create a private method that initializes the members. It cannot be overriden, it gives no headaches. You call it from all of the constructors and all is nice and leads to no contradictions or unexpected situations. As for calling methods of the derived class from the constructor, you can do it in C++ and it is idiotic. I don’t think this should trigger big changes in the language but rather in the brain of the guy who does it.

  42. Frank says:

    Sorry Eric, but I you haven’t explained a solution to the actual problem. The interesting problem was not why constructors should be invoked after initialisers, I think everyone with a bit of analytical insight agrees to this being a sensible design decision. The actual problem was why initialisation should be done in a certain order.

    My conjecture here is that it doesn’t really matter much in which order initialisations are carried out as long as they are performed before all constructors, meaning that the order if initialisation is not a serious source for ambiguity and unexpected behaviour. The reason for this, I would say, is that prior to initialisation the object is not observable: it might already physically exists in memory but no reference to it is exposed yet. Clearly, the aggregated objects should besides have no knowledge of the context in which they are initialised either. The exposure of the object reference is what creates ambiguities and unexpected behaviour, and the earliest that happens is when the (first) constructor is called. Prior to this we cannot observe what happens, at least if confining our observation to the state of the object to be instantiated, thus it is irrelevant from a logical point of view. This is maybe not quite true, I could imagine pathological cases where instantiation of the aggregated objects might have some side effect that depends on their order, but this would have to involve some global state and seems rather contrived.

    I’m not really a (C#) programmer so might be missing something; if anyone had a genuine example where the order of initialisation prior to constructor invocation is observable in the sense described above, and more importantly creates potential for misunderstanding and errors of the sort that happen when interleaving initialisation and constructor calls, that would be of interest to justify this design decision. Btw even if there is no favourable order it may still make sense to impose one to avoid unintentional exploitation of unspecified behaviour.

    Best,

    Frank