It's the platform, Silly!

I’ve been mulling writing this one for a while, and I ran into the comment below the other day which inspired me to go further, so here goes.

Back in May, Jim Gosling was interviewed by Asia Computer Weekly.  In the interview, he commented:

One of the biggest problems in the Linux world is there is no such thing as Linux. There are like 300 different releases of Linux out there. They are all close but they are not the same. In particular, they are not close enough that if you are a software developer, you can develop one that can run on the others.

He’s completely right, IMHO.  Just like the IBM PC’s documented architecture meant that people could create PC’s that were perfect hardware clones of IBM’s PCs (thus ensuring that the hardware was the same across PCs), Microsoft’s platform stability means that you could write for one platform and trust that it works on every machine running on that platform.

There are huge numbers of people who’ve forgotten what the early days of the computer industry were like.  When I started working, most software was custom, or was tied to a piece of hardware.  My mother worked as the executive director for the American Association of Physicists in Medicine.  When she started working there (in the early 1980’s), most of the word processing was done on old Wang word processors.  These were dedicated machines that did one thing – they ran a custom word processing application that Wang wrote to go with the machine.  If you wanted to computerize the records of your business, you had two choices: You could buy a minicomputer and pay a programmer several thousand dollars to come up with a solution that exactly met your business needs.  Or you could buy a pre-packaged solution for that minicomputer.  That solution would also cost several thousand dollars, but it wouldn’t necessarily meet your needs.

A large portion of the reason that these solutions were so expensive is that the hardware cost was so high.  The general purpose computers that were available cost tens or hundreds of thousands of dollars and required expensive facilities to manage.  So there weren’t many of them, which means that companies like Unilogic (makers of the Scribe word processing software, written by Brian Reid) charged hundreds of thousands of dollars for installations and tightly managed their code – you bought a license for the software that lasted only a year or so, after which you had to renew it (it was particularly ugly when Scribe’s license ran out (it happened at CMU once by accident) – the program would delete itself off the hard disk).

PC’s started coming out in the late 1970’s, but there weren’t that many commercial software packages available for them.  One problems developers encountered was that the machines had limited resources, but beyond that, software developers had to write for a specific platform – the hardware was different for all of these machines, as was the operating system and introducing a new platform linearly increases the amount of testing required.  If it takes two testers to test for one platform, it’ll take four testers to test two platforms, six testers to test three platforms, etc (this isn’t totally accurate, there are economies of scale, but in general the principal applies – the more platforms you support, the higher your test resources required).

There WERE successful business solutions for the early PCs, Visicalc first came out for the Apple ][, for example.  But they were few and far between, and were limited to a single hardware platform (again, because the test and development costs of writing to multiple platforms are prohibitive).

Then the IBM PC came out, with a documented hardware design (it wasn’t really open like “open source”, since only IBM contributed to the design process, but it was fully documented).  And with the IBM PC came a standard OS platform, MS-DOS (actually IBM offered three or four different operating systems, including CP/M and the UCSD P-system but MS-DOS was the one that took off).  In fact, Visicalc was one of the first applications ported to MS-DOS btw, it was ported to DOS 2.0. But it wasn’t until 1983ish, with the introduction of Lotus 1-2-3, that PC was seen as a business tool and people flocked to it. 

But the platform still wasn’t completely stable.  The problem was that while MS-DOS did a great job of virtualizing the system storage (with the FAT filesystem)  keyboard and memory, it did a lousy job of providing access to the screen and printers.  The only built-in support for the screen was a simple teletype-like console output mechanism.  The only way to get color output or the ability to position text on the screen was to load a replacement console driver, ANSI.SYS.

Obviously, most ISVs (like Lotus) weren’t willing to deal with this performance issue, so they started writing directly to the video hardware.  On the original IBM PC, that wasn’t that big a deal – there were two choices, CGA or MDA (Color Graphics Adapter and Monochrome Display Adapter).  Two choices, two code paths to test.  So the test cost was manageable for most ISVs.  Of course, the hardware world didn’t stay still.  Hercules came out with their graphics adapter for the IBM monochrome monitor.  Now we have three paths.  Then IBM came out with the EGA and VGA.  Now we have FIVE paths to test.  Most of these were compatible with the basic CGA/MDA, but not all, and they all had different ways of providing their enhancements.  Some had some “unique” hardware features, like the write-only hardware registers on the EGA.

At the same time as these display adapter improvements were coming, disks were also improving – first 5 ¼ inch floppies, then 10M hard disks, then 20M hard disks, then 30M.  And system memory increased from 16K to 32K to 64K to 256K to 640K.  Throughout all of it, the MS-DOS filesystem and memory interfaces continued to provide a consistent API to code to.  So developers continued to write to the MS-DOS filesystem APIs and grumbled about the costs of testing the various video combinations.

But even so, vendors flocked to MS-DOS.  The combination of a consistent hardware platform and a consistent software interface to that platform was an unbelievably attractive combination.  At the time, the major competition to MS-DOS was Unix and the various DR-DOS variants, but none of them provided the same level of consistency.  If you wanted to program to Unix, you had to chose between Solaris, 4.2BSD, AIX, IRIX, or any of the other variants.  Each of which was a totally different platform.  Solaris’ signals behaved subtly differently from AIX, etc.  Even though the platforms were ostensibly the same, they were enough subtle differences so that you either wrote for only one platform, or you took on the burden of running the complete test matrix on EVERY version of the platform you supported.  If you ever look at the source code to an application written for *nix, you can see this quite clearly – there are literally dozens of conditional compilation options for the various platforms.

On MS-DOS, on the other hand, if your app worked on an IBM PC, your app worked on a Compaq.  Because of the effort put forward to ensure upwards compatibility of applications, if your application ran on DOS 2.0, it ran on DOS 3.0 (modulo some minor issues related to FCB I/O).  Because the platforms were almost identical, your app would continue to run.   This commitment to platform stability has continued to this day – Visicalc from DOS 2.0 still runs on Windows XP.

This meant that you could target the entire ecosystem of IBM PC compatible hardware with a single test pass, which significantly reduced your costs.   You still had to deal with the video and printer issue however.

Now along came Windows 1.0.  It virtualized the video and printing interfaces providing, for the first time, a consistent view of ALL the hardware on the computer, not just disk and memory.  Now apps could write to one API interface and not worry about the underlying hardware.  Windows took care of all the nasty bits of dealing with the various vagaries of hardware.  This meant that you had an even more stable platform to test against than you had before.  Again, this is a huge improvement for ISV’s developing software – they no longer had to wonder about the video or printing subsystem’s inconsistencies.

Windows still wasn’t an attractive platform to build on, since it had the same memory constraints as DOS had.  Windows 3.0 fixed that, allowing for a consistent API that finally relieved the 640K memory barrier.

Fast forward to 1993 – NT 3.1 comes out providing the Win32 API set.  Once again, you have a consistent set of APIs that abstracts the hardware and provides a constant API set.  Win9x, when it came out continued the tradition.  Again, the API is consistent.  Apps written to Win32g (the subset of Win32 intended for Win 3.1) still run on Windows XP without modification.  One set of development costs, one set of test costs.  The platform is stable.  With the Unix derivatives, you still had to either target a single platform or bear the costs of testing against all the different variants.

In 1995, Sun announced its new Java technology would be introduced to the world.  Its biggest promise was that it would, like Windows, deliver platform independent stability.  In addition, it promised cross-operating system stability.  If you wrote to Java, you’d be guaranteed that your app would run on every JVM in the world.  In other words, it would finally provide application authors the same level of platform stability that Windows provided, and it would go Windows one better by providing the same level of stability across multiple hardware and operating system platforms.

In Jim Gosling post, he’s just expressing his frustration with fact that Linux isn’t a completely stable platform.  Since Java is supposed to provide a totally stable platform for application development, just like Windows needs to smooth out differences between the hardware on the PC, Java needs to smooth out the differences between operating systems.

The problem is that Linux platforms AREN’T totally stable.  The problem is that while the kernel might be the same on all distributions (and it’s not, since different distributions use different versions of the kernel), the other applications that make up the distribution might not.  Java needs to be able to smooth out ALL the differences in the platform, since its bread and butter is providing a stable platform.  If some Java facilities require things outside the basic kernel, then they’ve got to deal with all the vagaries of the different versions of the external components.  As Jim commented, “They are all close, but not the same.”  These differences aren’t that big a deal for someone writing an open source application, since the open source methodology fights against packaged software development.  Think about it: How many non open-source software products can you name that are written for open source operating systems?  What distributions do they support?  Does Oracle support other Linux distributions other than Red Hat Enterprise?  The reason that there are so few is that the cost of development for the various “Linux” derivatives is close to prohibitive for most shrink-wrapped software vendors; instead they pick a single distribution and use that (thus guaranteeing a stable platform).

For open source applications, the cost of testing and support is pushed from the developer of the package to the end-user.  It’s no longer the responsibility of the author of the software to guarantee that their software works on a given customer’s machine, since the customer has the source, they can fix the problem themselves.

In my honest opinion, platform stability is the single thing that Microsoft’s monoculture has brought to the PC industry.  Sure, there’s a monoculture, but that means that developers only have to write to a single API.  They only have to test on a single platform.  The code that works on a Dell works on a Compaq, works on a Sue’s Hardware Special.  If an application runs on Windows NT 3.1, it’ll continue to run on Windows XP.

And as a result of the total stability of the platform, a vendor like Lotus can write a shrink-wrapped application like Lotus 1-2-3 and sell it to hundreds of millions of users and be able to guarantee that their application will run the same on every single customer’s machine. 

What this does is to allow Lotus to reduce the price of their software product.  Instead of a software product costing tens of thousands of dollars, software products costs have fallen to the point where you can buy a fully featured word processor for under $130.  

Without this platform stability, the testing and development costs go through the roof, and software costs escalate enormously.

When I started working in the industry, there was no volume market for fully featured shrink wrapped software, which meant that it wasn’t possible to amortize the costs of development over millions of units sold. 

The existence of a stable platform has allowed the industry to grow and flourish.  Without a stable platform, development and test costs would rise and those costs would be passed onto the customer.

Having a software monoculture is NOT necessarily an evil.