A Definite Assignment Anomaly


UPDATE: I have discovered that this issue is considerably weirder than the initial bug report led me to believe. I’ve rewritten the examples in this article; the previous ones did not actually demonstrate the bug.

 Consider the following code:

struct S
{
  private string blah;
  public S(string blah)
  {
      this.blah = blah;
  }
  public void Frob()
  { // whatever
  }
}

This method body code fragment is legal (though probably ill-advised):

S s1 = new S();
s1.Frob();

Every struct has a default constructor which initializes its fields to their default values; this frobs the zero handle.

What about this?

S s2;
s2.Frob();

Looks like a definite assignment error, doesn’t it?

An interesting and little-known fact about the C# compiler is that this is reported as a definite assignment error if the struct is compiled from source code, but not if the struct is in a referenced assembly! What is the reasoning behind this seemingly anomalous behaviour?

Well, consider the following related situation:

struct SS
{
  public int x;
  public int y;
  public void Bar() {…}
}

Here we have the pure unadulterated evil that is a mutable value type. Is this legal?

SS ss;
ss.x = 123;
ss.y = 456;
ss.Bar();

Yes, that is legal. The local variable s essentially has two sub-variables, x and y. If x and y are both definitely assigned, then s is also considered to be definitely assigned. When checking definite assignment on a struct, the compiler actually checks to see whether we know that every field of the struct has been initialized. A constructor call is assumed to automatically initialize every field, but if there is not a constructor call, then we simply track every field independently and then pool the results. (And if the fields are themselves of value type, we do so recursively of course.)

The anomaly happens because when the compiler sees that the type in question is in source code, it tracks the definite assignment of all the fields, detects that “s1.blah” is an unassigned field, and gives a definite assignment error on s1. But when the type in question is from a referenced assembly, the compiler ignores inaccessible fields of reference type. We see that there were zero fields initialized out of the zero accessible fields that needed to be initialized, and declare that s1 must be fully assigned!

UPDATE: Initially I thought this behaviour was odd and undesirable but plausibly justifiable. However I have since learned that in fact, for reasons still obscure to me, the compiler only ignores inaccessible fields of reference type. Inaccessible fields of value type are still tracked for definite assignment. This weird inconsistency points strongly in the direction of this being a bug, not a feature. Continuing with my original text…

This behaviour is undoubtedly weird; whether it is desirable or not is a judgment call. There certainly is an argument to be made that the user has no ability to do anything about a private field on an imported type; a type they almost certainly did not author themselves, do not know the details of, and cannot change. Whereas if the type is in source code, then the user running the compiler has personal knowledge of its internals. So, the argument goes, we should give the user a pass on having to prove that the fields they cannot access and know nothing about are assigned; just let them be assigned to their default values and don’t worry about them.

There is also the (more compelling to me) argument that we ought to be consistent here and either consider all the fields all the time, or consider only the accessible fields all the time, rather than swapping back and forth between the two choices depending on the details of the compilation process. My choice would be the former; force the user to call the default constructor or use the “default(T)” operator if they really want the all-zero default version of the struct. That makes the intention very clear in the code.

Unfortunately, though I think I’ve argued that this behaviour is plausibly undesirable, we’re stuck with it. There are numerous BCL structs which have no public fields, and lots of existing code which uses them without calling the default constructor. Changing now to make this an error is a breaking change with no compelling benefit, and we’re not going to do that. Particularly since the anomaly is essentially harmless; whether we force you to say “s1 = new S()” or not, the handle field is in practice always initialized to its default value.

Nor do I think we should change this by making the code legal in all cases, even if the type is in source code; that’s making a bad situation worse merely in order to achieve consistency. This seems like the very definition of “foolish consistency”.

 

Comments (22)

  1. Leo Bushkin says:

    This is an interesting case. I was not aware there were situations where the compiler would allow you to declare a local instance without either initializing it or calling a constructor explicitly. Back in my C++ days, I recall many a time when local instances would be declared using a constructorless notation:

    S s1;  // default constructor automatically called

    I always found this notation confusing – particularly since to following is also legal, but much more obvious:

    S s1 = S(); // also legal but clearer

  2. Pavel Minaev says:

    @Eric:

    When compiling the version which is in the same assembly, does it complain regardless of the contents of Frob(). Or does it actually check whether Frob() references fields, and complain about those specifically?

    To be more precise, say I have Frob implemented thus:

      void Frob() { handle = …; }

    Would it still complain about the lack of definite initialization? If not, and if I call Frob on uninitialized handle, will it recognize s.handle as initialized after Frob returns for the purposes of definite initialization? Or does it generally assume that any method call on a struct will access all its fields?

    (I kinda have to ask because the behavior that you describe only makes some sense to me if compiler actually does cross-method boundary analysis for definite assignment for methods for which it has the source code).

    @Leo: your second C++ snippet requires a copy constructor to exist (even if it’s optimized away, which it legally can be), while the first one does not, so there is a difference. I don’t really have a problem with the parentheses-less syntax as such; the real evilness of C++ is the distinction between POD and non-POD, and initialization thereof.

    We do not do cross-method analysis. Basically, it’s analyzed as though the code

    h1.Frob()

    was actually written as

    MyHandle.Frob(ref h1);

    Since we are passing a variable as “ref”, it must be definitely assigned before the method call. We check definite assignment on structs by verifying that all known fields of the struct are fully assigned. In the “import from metadata” case, we don’t even load information about private fields off of the disk, so as far as we know, its fully assigned. But in source code, we have all the private field info available in memory already, and we use it. — Eric

  3. Tom says:

    Is this behavior in line with the specification?

    Good question. The spec says “A struct-type variable is considered definitely assigned if each of its instance variables is considered definitely assigned.” Notice that it does not say “each of its accessible instance variables’. So technically, I suppose this is a violation of the specification. — Eric

  4. configurator says:

    Pavel does make a point though.

    Suppose I have this silly mutable value type:

    public struct Point {

       public int X { get; set; }

       public int Y { get; set; }

       public Color Color { get; set; }

       public void GetColorFromWhatever(Whatever) {

           Color = …;

       }

    }

    and I call

    Point p;

    p.X = x;

    p.Y = y;

    p.GetColorFromWhatever(whatever);

    Obviously (obvious to the programmer, not to the compiler), this is what I meant to do albeit a bit awkward. I don’t really want to assign color since it’s a calculated value. But I have to either do

    Point p = new Point();

    or

    p.Color = dummy;

    I was willing to accept that – I prefer calling the constructor anyway. But now you tell me that if Color was a struct in a different assembly it would work – now that suddenly doesn’t make sense.

  5. Darrell says:

    [quote]Every struct has a default constructor which initializes its fields to their default values[/quote]

    This is the real problem IMHO.  As far as I’m concerned, there should never be an implicit parameter-less (default) constructor – if the dev wanted the object to be called with a parameter-less constructor, it would be in the definition.  I’ve been bitten by this back in my C++ days (though for the life of me I don’t remember the context, only that it was a PITA to track down), and hoped that C# was immune to it.

    I’m sure there is some reason for it, but does that reason outweigh the problems?

  6. Neil Whitaker says:

    An important case to consider in this discussion is this one:

    S[] sArray = new S[1000000];

    If you require me to call S’s constructor, how will you make sure I call it for all the values of this array?

    In the case of arrays of classes, it’s simple, because they’re initialized to null, which means to class exists yet. But in the case of a value type, as soon as you allocate space, you have a struct.

  7. Gabe says:

    Neil: the only way to assure that all elements of an array get their constructors called is to have the compiler generate loops for each array to call the default constructor on each element. This would mean that every array element would get initialized at least twice — first to all-zeroes by the CLR, then by the default constructor, then possibly by the actual intended value for each element. That should pretty much answer Darrell’s question.

  8. pradeeptp says:

    Please clarify what you mean by referenced assembly. Here is what I did and I got the error “Use of unassigned local variable…”

    1) Created a console application project called “Assembly1″. Created the public struct MyHandle {…}

    2) Created a second console application project called “Client”.  Referenced “Assembly1″.  In Main() I wrote

    MyHandle h1 = new MyHandle();

    h1.Frob();

    //I got the error “Use of unassigned local variable…”

    I am using C# 3.0.

    That’s weird. There’s no unassigned local at all in that scenario. — Eric

  9. Rim says:

    Looks the same on C# 3.5, I also created 2 console applications in seperate solutions. In solution Test1 I put

    namespace Test1
    {
       public struct Class1
       {
           private int lala;
           public Class1(int lala) { this.lala = lala; }
           public void Foo()    { }
       }
    }

    Then I copy the Exe someplace else and reference that (so the Exe, not the project) in solution Test2:

    Test1.Class1 c;
    c.Foo(); // Use of unassigned local variable

    I’m probably missing something obvious here, but I can’t figure out what.

    Holy goodness, this just gets weirder. I’ll update the entry. — Eric

  10. Pavel Minaev says:

    Eric, in light of your update, does this:

    > Unfortunately, though I think I’ve argued that this behaviour is plausibly undesirable, we’re stuck with it.

    still apply, or is it now officially in the defect-to-be-fixed land?

    My argument stands: there are plenty of value types in the BCL that have reference type private fields and existing code that uses them as locals without explicit implementation. The buggy behaviour is safe and simply results in the fields being initialized to nulls, exactly as though the local were instead a field initialized to the default value. With no compelling up side to fixing it, and potentially large down sides, it’s an easy call to make; we won’t fix it. — Eric

  11. Gabe says:

    So the problem is just that private reference types in members of structs from other assemblies might be usable without the other assembly’s author explicitly setting them to null? I think I can live with that.

  12. pete.d says:

    I’m less familiar with the CLR spec than the C# spec. C#’s definite assignment rules make any implementation detail moot, but in the CLR, are we _guaranteed_ that value type local variables will in fact be initialized to zeroes?  Or is that simply a coincidental fact of the current implementation?

    I realize that, just as this bug won’t be fixed for fear of breaking a bunch of existing buggy BCL code, there may be an implicit requirement at this point that stack variables be initialized to zeroes. But does the spec actually require that?

    See Partition III section 3.4.7, “localloc”. I quote: “If the localsinit flag on the method is true, the block of memory returned is initialized to 0; otherwise, the initial value of that block of memory is unspecified.” — Eric

  13. pete.d says:

    Thanks for the pointer. So, the way I read that as it relates to my question is that the answer is “no, there is no guarantee locals will be initialized to zeros”. That is, the C# specification doesn’t, as near as I can tell, require that the run-time option “localsinit” be used (*), and so whether they are or not is entirely up to the specific implementation.

    IMHO, that could in fact be an argument for going ahead and fixing this bug, in spite of the effort that would be required in the BCL to fix up all the uninitialized variables.  After all, _some_ of those uninitialized variables could in fact be bugs themselves, even if not all are.  What better way to improve the overall quality of the BCL implementation than to take advantage of a correct implementation of the definite assignment rules?

    (*) Indeed, the C# specification seems to mainly be agnostic about a specific run-time implementation; the .NET Framework is assumed and of course certain primitives are explicitly used (System.Int32, System.Boolean, System.Threading.Monitor, etc.), but otherwise it seems that there’s no specific target language for the compilation output, never mind use of specific features of the target language.

    The C# compiler is responsible for implementing the semantics of the C# language, and the semantics of the language are such that it *should* be impossible for the runtime’s initialization implementation details of locals to be unobservable (modulo of course using debuggers to peer into variables before they are assigned to). Now, I think it is plausibly argued that what we have here is a bug; with this, one can observe the fact that our implementation of the compiler does in fact ensure that fields of struct locals are initialized to zeroes. But it would be rather strange for the spec to say “a correct implementation must ensure that local initialization details are non-observable, but an incorrect implementation is required to ensure that they’re initialized to zeroes”. The spec doesn’t say anything at all about what an incorrect implementation is required to do. — Eric

  14. Pavel Minaev says:

    > My argument stands: there are plenty of value types in the BCL that have reference type private fields and existing code that uses them as locals without explicit implementation. The buggy behaviour is safe and simply results in the fields being initialized to nulls, exactly as though the local were instead a field initialized to the default value. With no compelling up side to fixing it, and potentially large down sides, it’s an easy call to make; we won’t fix it.

    By "fixing" this I rather meant switching to "guaranteed null initialization for structs" (or any other intermediate semantics that would ensure consistent behavior between structs in current project and ones from referenced assemblies). I understand that a "proper fix" in the spirit of language spec would be out of question for compat reasons.

  15. Pavel Minaev says:

    @pete: there’s one other bit of Ecma-335 CLI spec that is relevant here to understand why assemblies produced by VC# always specify "localsinit" (Partition III, 1.8.1.1 "Verification"):

    "Verification assumes that the CLI zeroes all memory other than the evaluation stack before it is made visible to programs. A conforming implementation of the CLI shall provide this observable behavior. Furthermore, verifiable methods shall have the localsinit bit set, see Partition II (Flags for Method Headers). If this bit is not

    set, then a CLI might throw a Verification exception at any point where a local variable is accessed, and where the assembly containing that method has not been granted SecurityPermission.SkipVerification."

  16. Pavel Minaev says:

    While looking up all references to "localsinit" Ecma-335, I’ve found the following interesting bit (Partition II, 12.4.1 "Method calls"):

    "All method calls initialize the method state areas (see §12.3.2) as follows:

    2. The local variables array always has null for object types and for fields within value types that hold objects. In addition, if the localsinit flag is set in the method header, then the local variables array is initialized to 0 for all integer types and to 0.0 for all floating-point types. Value types are not initialized by the CLI, but verified code will supply a call to an initializer as part of the method’s entry point code."

    So it says that there’s no guarantee of default initialization for local structs, and that verifiable code is required to initialize them, either by assignment, or via initobj? But I cannot find any definite confirmation of that in section that describes the verification algorithm. It says that verifiable code must use localsinit, and that it also must guarantee no reads of uninitialized locals, but it stops short of saying whether doing the former guarantees the latter, or not.

    Then there’s also this (Partition II, 13.2 "Initializing value types"):

    "Like classes, value types can have both instance constructors (§10.5.1) and type initializers (§10.5.3). Unlike classes, whose fields are automatically initialized to null, the following rules constitute the only guarantee about the initilization of (unboxed) value types:

    • Static variables shall be initialized to zero when a type is loaded (§10.5.3.3), hence statics whose type is a value type are zero-initialized when the type is loaded.

    • Local variables shall be initialized to zero if the localsinit bit in the method header

    (§25.4.4) is set.

    • Arrays shall be zero-initialized.

    • Instances of classes (i.e., objects) shall be zero-initialized prior to calling their instance constructor.

    [Rationale: Guaranteeing automatic initialization of unboxed value types is both difficult and expensive, especially on platforms that support thread-local storage and that allow threads to be created outside of the CLI and then passed to the CLI for management. end rationale]"

    But on the other hand, there are several explicit mentions that having "localsinit" bit is sufficient to get an initialized local struct. For example, PIII, 12.1.6.2.1 "Initializing instances of value types":

    "There are three options for initializing the home of a value type instance. You can zero it by loading the address of the home (see Table 8: Address and Type of Home Locations) and using the initobj instruction (for local variables this is also accomplished by setting the localsinit bit in the method’s header). You can call a user-defined constructor by loading the address of the home (see Table 8: Address and Type of Home Locations) and then calling the constructor directly. Or you can copy an existing instance into the home, as described in §12.1.6.2.2."

    In a bunch of other places, it refers to a mysterious concept named "zero-initialization", or "initialize to 0". E.g. the above quotation of 13.2 – "local variables shall be initialized to zero" – or, say, PII 12.3.2 "Method state":

    "Local variables are initialized to 0 before entry if the localsinit flag for the method is set (see §12.2)."

    There are a few others scattered around, but what’s interesting is that nowhere I could find the actual definition of "zero-initialization" or "initialize to 0" that would be generic for all types.

    Hmm. It seems to me that the spec could be a bit more unambiguous here…

  17. Gabe says:

    So in order to observe this bug vis-a-vis the BCL, this would mean that the BCL has to have an assembly with a struct containing a private field of reference type, where the field is not initialized to null, and said struct is never instantiated by any code inside that assembly?

    Yes, yes and no. The third point is not required. — Eric

    While I can imagine that the BCL contains many structs with private fields of reference types, I’m having a hard time coming up with a reason that such a struct would never be instantiated inside the assembly where it’s declared. Can somebody come up with an example?

    The way the bug manifests itself is this. You are a third party developer and you write

        private static DictionaryEntry GetEntry()
        {
            DictionaryEntry entry;
            return entry;
        }

    Technically this is not legal, since we have a local variable whose value is returned before it is initialized. In practice, this has the same effect as if you had called the default constructor; the private key and value fields are set to null. 

    But if the authors of the assembly containing the System.Collections namespace tried to write that same method, the compiler would give an error. The buggy behaviour is that we allow it if you are a third party, but not if you are the author. Neither should be allowed.

    – Eric

  18. pete.d says:

    @Eric: "…it *should* be impossible for the runtime’s initialization implementation details of locals to be unobservable." Is there an extra negative in there? The statement makes more sense to me if "it should be impossible…to be observable".  And I agree with your conclusion.  My point isn’t that there should be some well-defined behavior given a bug, but rather that the lack of well-defined behavior is motivation to fix the bug, in spite of the apparent safeness.

    @Pavel: thanks for the references.  Still, based on all that it still seems that the conclusion is that there’s nothing in the C# specification that requires "localsinit", right? I’m a bit confused about the question of verifiable. On the one hand, the reply from Scott Wisniewski asserts that output from non-"unsafe" C# code must be verifiable, but of course lacking "localsinit" it’s actually not.

    I feel like I’m missing something here, but it’s escaping me at the moment.

    @Gabe: I haven’t surveyed the BCL implementation myself. But I trust Eric when he says that it’s filled with such examples.

    My concern (if you can even call it that…in reality, I’m happy to leave these details and decisions to the Microsoft engineers who own the BCL) is that while the worst bug would be to use a reference type variable that has garbage in it, it could still be a bug to use a reference type variable that has been initialized to null but should in fact have been initialized explicitly by the method’s code.

    Within that subset of potential bugs, there are two versions: attempt to use the null reference and getting an exception, and checking for the null reference and following the wrong code path.  Of course, it seems very likely that instances of the former version are either non-existent or extremely rarely experienced, since those produce an obvious and immediate symptom. But the latter fails in a silent, not necessarily readily apparent way.

    I will readily grant that the intersection of all of the above likely is quite small, perhaps even an empty set. But there’s no way to know for sure.

    And frankly, not that this is a justification for certain BCL development decisions, but I do find it a bit disorienting to have recently read an article from Eric on the importance of having a spec, and then to read how it’s not important to follow the spec.  :)

  19. pete.d says:

    Oops. I left out a URL in my previous comment. In my response to Pavel, I was referring to this StackOverflow thread:

    http://stackoverflow.com/questions/1661150/c-compiler-generic-code-with-boxing-constraints

  20. Pavel Minaev says:

    C# specification doesn’t require CLI in the first place, so naturally it doesn’t say anything about "localsinit". However, it is clear that, in practical terms, VC# must be able produce verifiable assemblies given "sane" input (no /unsafe, no fancy StructLayout.Explicit tricks etc) – there are many important scenarios where it matters, such as ASP.NET or Silverlight.

  21. Pavel Minaev says:

    By the way, another undesirable side effect of this is that it makes changing the visibility of a struct field from private/internal to public a breaking change (on source code / recompilation level, not on binary level) for the clients, where it wouldn’t be one if you stick to the letter of the spec.

  22. pete.d says:

    "However, it is clear that, in practical terms, VC# must be able produce verifiable assemblies given "sane" input (no /unsafe, no fancy StructLayout.Explicit tricks etc) – there are many important scenarios where it matters, such as ASP.NET or Silverlight."

    So, the assertion is that for a given implementation of C#, because the target itself has a de facto requirement that each non-"unsafe" method be verifiable, that for that implementation we can assume that locals are initialized to zeroes.

    That, along with the fact that this is a bug in that given implementation of C# and not in the specification itself, means that this particular bug in this particular implementation is of no great concern, at least with respect to the question of accidently using uninitialized storage (I still assert that even initialized to null, but only implicitly, could in some cases result in incorrect behavior).

    By the way, a couple of things I noticed when I was playing around with this:

     • VS actually emits a "Use of unassigned local variable" error while one is editing the source file. It goes away when you actually compile, and returns if the file is edited again. Looks like the IDE’s own compilation logic got it right.

     • Even an unsafe method winds up with "locals init". Seems like the C# compiler likes the belt-and-suspenders approach.

    Anyway, it’s definitely an interesting issue.