A little more on bytes and endianness (byte order)

In my last post I talked about converting structs into byte arrays and vice versa.  There are a couple of related posts I've found that are worth reading:

The first post gives a nice example on converting a type (class, struct) from one endian type to another.  The second gives a few different ways to convert between a byte array and an int.  Reading the second one I thought it might be useful to point out two utility classes that are available in the framework: System.BitConverter and System.Buffer .

BitConverter lets you convert between an intrinsic type and a byte array.  GetBytes(intrinsicType) will convert to a byte array.  ToInt32(), ToChar(), etc., will pull an intrinsic type from a specified position in a byte array.

Buffer will let you copy bytes from any type of array (not just byte[]) to another array.  It also allows you to get and set specific bytes in any type of array.

Add those two to "things to remember" if they aren't already filed there. :)

 

On to other things... I started mentioning endianness in my last post.  Endianness refers to the order used when storing a data type that is comprised of multiple bytes.  (Follow the link for more details.)  While the "PC" is little-endian, there are a number of other computers (Macintosh--although not for long) that are not.  Where on the PC most files' data is written in little-endian, there are still a number of files that have big-endian ordering.  The .Net framework mostly works only with little endian files so this involves a bit of work on your part.

Byte order only matters if you're talking about types that have more than one byte.  An ASCII string is comprised of a series of bytes, so no problem there.  Some Unicode encodings use multiple bytes and, as such, the byte order matters.  The framework does provide endian support for UTF-16 in the System.Text.UnicodeEncoding class.  See the link for more details.

System.Net.IPAddress has methods that will potentially swap endian order on integers:  NetworkToHostOrder () and HostToNetworkOrder ().  The network order (by Internet Protocol standard) is big-endian, while the host is little-endian if you're on a little-endian host (say, the PC).  Even if you're positively sure you won't ever have your code on anything but a PC, it seems to be fairly bad form to leverage the above to convert your integers.  In addition, for 32bit and larger values the methods aren't terribly efficient as they call down to the smaller swaps:

 public static short HostToNetworkOrder(short host)
{
   return (short)(((host & 0xFF) << 8) || ((host >> 8) & 0xFF));
}

public static int HostToNetworkOrder(int host)
{
   return (((IPAddress.HostToNetworkOrder((short) host) & 0xffff) << 0x10) |
      (IPAddress.HostToNetworkOrder((short) (host >> 0x10)) & 0xffff));
}

public static long HostToNetworkOrder(long host)
{
   return (long)(((IPAddress.HostToNetworkOrder((int) host) & 0xffffffff) << 0x20) |
      (IPAddress.HostToNetworkOrder((int) (host >> 0x20)) & 0xffffffff));
}

To be fair, the performance hit that you get isn't noticeable until you start doing milllions of conversions.  On top of that, I am no performance guru and I'm sure my examples could potentially be tweaked for the better as well.  Following the lead of others I mask out the bytes I want to move, shift them to the proper position, then OR them back together.  Here's the example for a uint:

 /// <summary>
/// Endian swaps an unsigned int.
/// <summary>
/// <param name="source">The unsigned int to swap.</param>
/// <returns>The swapped unsigned int.</returns>
public static SwapUnsignedInt(uint source)
{
   return (uint)(((source & 0x000000FF << 24)
      | ((source & 0x0000FF00) << 8)
      | ((source & 0x00FF0000) >> 8)
      | ((source & 0xFF000000) >> 24));
}

In my wrapped BinaryReader class I have a property for "endianness" and when I read a type out of the imbedded BinaryReader I check said property for endianness and flip appropriately.  Makes it relatively painless to read a TIFF, for example (which can be either way), as I can simply set the property to "BigEndian" if I need to and merrily go on my way.

That's about it for now.  I'm not going to list the rest of the swapping methods unless someone requests it.  Oh, and one last comment:  there are things I'm still not addressing, such as alignment.  Nothing is simple with interoperability. ;)