Why does BitConverter.LittleEndian return false on my x86 machine?


Welcome to CLR Week 2013, returned from its two-year hiatus.

A customer reported that when they checked with the debugger, Bit­Converter.Little­Endian reported false even though they were running on an x86 machine, which is a little-endian architecture.

ushort foo = 65280;
65280
BitConverter.IsLittleEndian
false
BitConverter.GetBytes(foo)
{byte[2]}
[0]: 0
[1]: 255

The bytes are extracted in little-endian order, despite the claim that the machine is big-endian. "I don't get it."

I didn't know the answer, but I knew how to use a search engine, and a simple search quickly found this explanation:

Reading a member from the debugger merely reads the value of the member from memory.

That simple statement hides the answer by saying what happens and leaving you to figure out what doesn't happen. Here's what doesn't happen: Reading a member from the debugger does not execute the code to initialize that member.

In the case of Bit­Converter, the Little­Endian member is initialized by the static constructor. But when are static constructors run? For C#, static constructors are run before the first instance is created or any static members are referenced. Therefore, if you never create any Bit­Converter objects (which you can't since it is a static-only class), and if you never access any static members, then its static constructor is not guaranteed to have run, and consequently anything that is initialized by the static constructor is not guaranteed to have been initialized.

And then when you go looking at in the debugger, you see the uninitialized value.

Why doesn't the debugger execute static constructors before dumping a value from memory? Probably because the debugger wants to avoid surprises. It would be weird if you tried to dump a value from the debugger and the program resumed execution!

Now, when you ask the debugger to evaluate Bit­Converter.Get­Bytes(foo), the debugger has no choice but to execute application code, but that's okay because you explicitly told it to. But let's continue that debugging session:

ushort foo = 65280;
65280
BitConverter.IsLittleEndian
false
BitConverter.GetBytes(foo)
{byte[2]}
[0]: 0
[1]: 255
BitConverter.IsLittleEndian
true ← hey look

Your call to Bit­Converter.Get­Bytes(foo) caused code to execute, and then the CLR said, "Okay, but before I call this member function, I am required to run the static constructor, because those are the rules," and that resulted in the Is­:Little­Endian field being initialized to true.

The customer replied, "Thanks. The trick was finding the correct search terms."

I didn't think my choice of search terms was particularly devious. I simply searched for bit­converter.is­little­endian false.

Bonus reading: The byte order fallacy by Rob Pike. "Whenever I see code that asks what the native byte order is, it's almost certain the code is either wrong or misguided."

Comments (27)
  1. IsLittleEndian being a field and not a trivially inlinable property (returning a constant value) is a design mistake to begin with.

  2. @xor88: It is, but only if you ignore the fact that .NET is designed to run on multiple architectures (x86/x64/ARM/PowerPC at least).

  3. I'll disagree with the assertion "almost certain" in the Bonus Reading section (after having duly read the linked article).

    It is common in our multi-processor embedded systems that one of them is big-endian while the others are little-endian. Several years ago I did an FPGA that implemented a communications pipe between them using the 16-bit native data busses, connecting them bit-for-bit through a 1K-deep FIFO. The BE processor wrote it's 16-bit words one way, and the LE processor the other way.

    That project had a heated argument over who was going to do byte-swapping (SW or HW).

    Byte order doesn't matter…. only if you 1) send data only as 8-bit bytes, and 2) agree ahead of time on the order in which bytes are transferred.

  4. Hm says:

    I would have assumed that this is property is exposing an intrinsic constant of the runtime engine, so there is no need to run a static constructor to get this value. (At least, the runtime engine should know the endianess of the processor it emits machine instructions for.)

    In pseudo-VB.NET:

    ReadOnly Class Property IsLittleEndian as Boolean

    Get

     return Environment.IsLittleEndian

    End Get

    This is still trivially inlinable, und would work also in the Debugger.

  5. The source code if the .NET framework could just have

    #if x86 || x64

    return true;

    #else

    return false;

    #end

    as the property body.

  6. Joshua says:

    Rob Pike needs to learn about the hton? family of functions (htons, htonl, htonq).

    @Hm: I would have done it with a static initializer inside #ifdef, which results in the behavior actually observed.

  7. McKay says:

    @Karellen

    You want a debugger to run all static constructors each time? The number of static constructors is theoretically infinite. Maybe what you want is just a certain subset of them? If that's the case, put those in program startup.

  8. Julien says:

    @Karellen, adding to what McKay said, this also opens things up for some interesting Heisenbugs, where the program you're debugging behaves differently in the debugger than in the normal runtime. Yeah, it should not actually behave differently in that case, but if your program would not have any bugs you probably would not run it in the debugger in the first place…

  9. Adam Rosenfield says:

    Java also has a similar stipulation, where class static constructors are run before the first instance is created or before the first static method is called.  If the class static constructor throws an exception, then it terminates the program, since it's impossible to catch that exception anywhere — you'd have to put a try/catch block around every usage of the class, which would be totally insane.

    Also, CLR Week >> Batch File Week

  10. Julien says:

    @McKay, by the way, why are you saying that the number is theoretically infinite? While it is theoretically unbounded (meaning that there may be arbitrary many static constructors, practical considerations like available resources aside), I'm (genuinely!) curious when a program may have infinitely many static constructors. Generics/templates maybe?

  11. Yuri says:

    I totally agree with rob on this one. Management think it's trivial to detect bit endianness, architecture, os version, dot net framework version etc… But my experience has shown that this type of code will invariably fail in different environnements.

  12. Danny says:

    And because you referenced that stackoverflow article here somebody from here added a new answer there to this postmaking a circular reference :).

  13. Nick says:

    Glad to see CLR week make a return.

    It would be remiss not to mention the great series on static constructors that Eric Lippert did earlier this year: ericlippert.com/…/static-constructors-part-one  It helped me clarify how and when static initialization happens in .NET.

  14. John Doe says:

    Next best thing after byte-order is alignment.

    Curiously, many (most? all?) binary RPC formats have been devised with specific machine performance in mind. They must all be wrong, somehow.

    XDR-style RPC message formats will always be a waste of time for little-endian machines. It will also be a waste of time for other-than-4 aligned machines (these were rare back in 2006 (?), see for yourself: tools.ietf.org/…/rfc4506 )

    NDR-style RPC message formats seem to take the best and worst of both worlds. When encoding the message, indicate whatever endianess and alignment/packing the message was built with, allowing C/C++ structs be sent directly to the network. The receiving end must have all the code necessary to decode the message in a generic way, but when the architectural configuration (again, endianess and alignment/packing) are the same as the message's, it can again just read from the network. This makes reads and writes across the network may be the most efficient when possible, and it still allows for cases where something is common e.g. same-alignment-different-endianess to be a bit optimized. One might think that the worst case of decoding is server-side only, but that's not true, as the messages from the server to the client are done exactly the same way, so both have to have generic decoders.

    But NDR in MSRPC might just mean little-endian, depending if type serialization is version 1 ( msdn.microsoft.com/…/cc243890.aspx ) or 2 ( msdn.microsoft.com/…/cc243889.aspx ). Attaboy.

  15. CornedBee says:

    Or the debugger could take a third option and somehow indicate that it is displaying a static member of a class whose static constructor has not yet run.

  16. @CornedBee: Is the debugger smart enough to know all the code executions paths that could have taken place while the process was running (before being halted)? Is it smart enough to know if members have been initialized yet?

  17. Karellen says:

    I can't tell for certain from that link, but do the static constructors have to run *immediately before* the first instance is created or any static members are referenced, or can they run *any time before* the first instance is created or any static members are referenced? Bullet point 4 ("The user has no control on when the static constructor is executed in the program.") suggests the latter.

    If it is the case, it seems it would be permissible for the debugger to run all static constructors immediately on program startup/library load time/attach time. Yes, it could slow start-up time somewhat, but *you're in the debugger*, and can probably afford the hit in that situation. Or, it could be an optional behaviour, or a command you could give the debugger at any time ("run all un-executed static initialisers now")

    Is there a reason that this is not done? (Other than "the default state of all possible features is "not implemented", and they start with -100 points")

    (@Joshua – did you read the linked article? Also, Rob Pike may well have *written* the [hn]to[nh][lsq](3) family of calls :-))

  18. John Doe says:

    @Brian_EE, it's smart enough to tell if a class's static constructor has run: blogs.msdn.com/…/51348.aspx

  19. ErikF says:

    @John Doe: My guess is that POD readonly getters are optimized away, so there's no method to run: in that case, you'd just have a plain variable with no magic.

  20. voo says:

    "For C#, static constructors are run before the first instance is created or any static members are referenced"

    Note though that as usual the as-if rule applies.

    class Foo {

       public static int x;

       public static int y = 5;

       static { // Java syntax because I'm way too lazy to look up how that worked again in C#

           x = 1;

       }

    }

    If the only thing you're doing is read y, the compiler/JIT are perfectly fine to *not* call the static constructor (as a matter of fact at least javac will inline Foo.y – a really stupid decision by the spec – so this is not even as theoretic as you'd think). Which makes this whole thing potentially even more surprising to people.

  21. Mike Dimmick says:

    The declaration in Rotor (Shared Source CLI) 2.0 has a #if BIGENDIAN and sets 'public static readonly bool IsLittleEndian' to true or false as appropriate. The CLR rules say to initialise static members in the static constructor.

    You can't use 'const' because the CLR rules say that the compiler must replace uses of that symbolic constant with that value. This means that if the declaration *had* used const, you would get different results depending on whether you compiled your C# program to IL on a big- or little-endian machine. And that would be bad.

  22. CWO says:

    From embedded software side I totally disagree with Rob: In fact he presents the two lines you would enclose in #ifdef BIGENDIAN and says you won't need more than that. But it is important to know the encoding of the byte stream (or whatever) and the encoding of the machine your code is running on. Otherwise you'll never get usable values out of the stream. And for the most time it is not sufficient just to say "whoohoo I've got an int, what matters the content?"

    Also the point of being able to address a single byte of an int or not: this is point the compiler has to take care of. Maybe it will produce massive overhead, but the compiler will get the job done.

  23. John Doe says:

    @ErikF, then guess again.

    First, this is not a getter, it's a public static field: msdn.microsoft.com/…/system.bitconverter.islittleendian.aspx

    Second, CLR's rules of static class constructor invocation are very clear, any access to a static or instance member (field, method) will trigger it. But remember that we're inspecting this static field in the debugger which probably uses reflection or a lower level mechanism, not in normal evaluation.

    There's also a slight difference between having a static constructor or not, even if there are field initializers: http://www.yoda.arachsys.com/…/beforefieldinit.html

  24. Carsten says:

    Lots of people here have misunderstood Rob Pike's article. What he is essentially saying is that endianness is only an issue when you're doing I/O. After you've done your I/O, you do not need to care about endianness. So yes, the stream you're reading must have defined an order of the bytes coming in. You can read these bytes with 100% portable code (as he demonstrates) without knowing the machine's byte order. #ifdef's are not necessary.

    If you find yourself disagreeing with Rob, you really, really need to go back and re-read the post and understand what he's saying.

  25. mockmyberet says:

    #Carsten:: Amen.

    I thought it was brilliant. "//Computer// byte order" should never matter to you.

  26. mockmyberet says:

    And to whomever didn't agree with the assertion of "almost certain," that "assertion" wasn't an "assertion," that's why he says "almost." "Almost" is the opposite of assertion. Assertion is definite, he states "almost certain" _because_ there are exceptions and, in that statement of "almost," he is qualifying that. A qualifying statement is quite the opposite of an asserted statement. Most programmers being in their right minds would refrain from making solid, blanket or definitive statements.

  27. Joshua says:

    It would have been nice if the hton? and ntoh? family took char * and dealt with all the machine alignment problems as well as byte order.

Comments are closed.