The Beginning of the Endian

Article
06/07/2005

So the rumors have been true. Apple is moving to Intel processors, which, in some ways, makes sense. At least it does from Apple's point of view. The Kool-Aid has never really affected their ability to figure out what's right for Apple regardless of the direction they've taken.

For users, the transition won't be all that painful either. Indeed, the prospect of being able to run VirtualPC on a native Intel processor has me, as a user, licking my chops. Buy a Mac and you'll get the best of all three worlds: Mac, Unix and Windows.

When you're a developer, though, the Kool-Aid can be rather heady. For those of us who've been around for a while, developing software for Apple's computers has often been rather like being a farm animal with a ring through its nose. In fact, the primary difference between Apple and Microsoft as platform vendors is that Apple generates excitement in the user base while Microsoft tends toward generating excitement in the independent software developer base.

So, when Steve Jobs starts talking about certain software developers having little trouble making the transition from the PowerPC to x86 processors, you can pretty well bet that the Kool-Aid is laced with all sorts of compounds that are not there for your benefit.

No doubt, some, perhaps even many, applications will have little difficulty making the transition. There is, however, a class of applications that will require a significant amount of work. The feature they all have in common: they produce documents--files with data in binary form. To understand why, we need a little lesson in the history of microprocessor design.

Microprocessors have data registers. These are analogous to little scratch-pads on your desktop. When the computer needs to do some manipulation of data, it pulls that data out of memory and puts it into these little scratch pads. When the computer is done with that data, yet wants to save it for later use, it writes the data on the scratch pad out to memory.

Early microprocessors had 8-bit, or byte-sized, data registers. This was generally convenient, because memory is considered to be just a very huge array of bytes. However, when the first 16-bit microprocessors were designed, the size of the scratch pads, or data registers, was twice the size of a single, addressable piece of memory.

This introduced a design problem. When you read a 16-bit value from a byte-addressable data store, like memory, which byte do you load first? You have two choices. You can load the most-significant byte first (known as big-endian), or you can load the least-significant byte first (known as little-endian). The difference between big-endian and little-endian systems is often referred to as "byte-sex". It's a term I've used before, and, for those who've wondered what it means, now you know.

As luck would have it, the two most active designers of microprocessors at the time, Intel and Motorola, answered that question in opposite ways. I'm not well-versed in VLSI design, but I have little doubt that each group of chip designers had legitimate reasons for the choices they made. None of that is important here. What is important is that, way back in those days, Apple chose Motorola processors while IBM chose Intel. Thus began the great endian divide.

Independent of each other, this isn't really a significant issue. Big-endian is slightly more difficult to work with than little-endian, but the parameter type provided by modern compilers renders the problem negligible. The real problem starts when binary data gets saved to disk and shared.

To see how this works, imagine a program that keeps track of people's salaries. Your boss decides to give you a raise, so this program reads your salary from memory into one of these data registers, adds your raise to it, and then writes it back out to memory. At the end of the day, the program stores all employee records to disk as just an array of bytes as they reside in memory. Let's also pretend that the accounts use computers with Intel processors, so the data got written in little-endian format.

Now suppose you want to look to make sure you got the correct raise. You fire up a program on your Mac that reads your employee record, but when it reads your salary, it's going to read the bytes of that number in reverse order from the way it was written on an Intel processor. If your new salary is supposed to be $128 a week, unless the data was byte-swapped when it was read from the disk, your Mac will lead you to believe that your salary is really $2,147,483,648 per week. Now that's a raise!

If you think this is messy, then know that I've oversimplified the problem significantly. Rarely do programs maintain various numbers in isolation from each other. Employee records often have other numbers associated with them, like hours, hire dates, number of children, exemption status, etc. These are kept in compound data structures generally known as records. Many of them are quite huge.

When transferring data between x86 and PowerPC based computers, each numeric value in each of these records needs to be byte-swapped. Code that doesn't properly byte-swap data written on the other processor type has a bug. I call these "unsafe byte-sex" bugs. If you're going to move documents with binary data between PowerPC and Intel processors, you have to write code that practices safe byte-sex.

For some developers, this problem will be huge. Word, for example, has a total of 176 distinct data structures in its binary file format, each of which needs to be properly byte-swapped whenever one of them is read from the disk. Word's example is probably a bit extreme, but even 20 or 30 data structures requiring proper byte-swapping represents a good chunk of work for both developers and testers.

That said, life for Word will actually get quite a bit easier. Why? Because Word's files are always written in Intel byte-sex. Given the number of unsafe byte-sex bugs I've had to fix over the course of my career working on Word, I'm actually looking forward to the day when I can forget about them almost entirely.

Rick

Currently playing in iTunes: Saint Augustine In Hell by Sting

The Beginning of the Endian

Additional resources