Why does it take so long to ship Hello World?

A recent comment on a Slashdot story about the Longhorn / WinFS announcement asks why WinFS is taking so long to develop, and predicts that once it is released there will be an open source clone within months. (Something similar happened with the CLR and the Mono project; it took thousands of person-years for Microsoft to develop the .NET Framework, but much less time for the Ximian folks to build a compatible implementation).

That's just the way the world works.

Let's take for example everyone's favourite program, Hello world. Once K&R published "Hello world," everyone knew what it was supposed to do and they could trivially write clones of it. People started writing "Hello world" in different languages (both human and computer). They started adding bells and whistles, like printf("Hello %sn", argv[1]) so that you could get the computer to say "Hello" to whatever name was given on the command line (or complete garbage if no such argument was given :-) ). People did all sorts of things, and it only took a few short seconds to whip up a "Hello world" clone and share it with your friends.

But how long did it take K&R to write the original? Obviously I don't know, but based on how things happen at Microsoft, I can imagine...

Please note that this post is not an "open source only copies what Microsoft does" post. It is simply a "designing and building a new product takes a long time" post.

First of all, you have to think about the "customer" and the "scenarios." Sure, you're writing a book about the C programming language, but who is the real customer? Is it solely for college kids studying CS-101? What is "hip" and "cool" for college kids these days? How could the program -- their very first introduction to the language -- best connect with the Youth of America?

Or is it for professional programmers, already well versed in COBOL or FORTRAN or some other language? What kinds of day-to-day tasks would they be familiar with that your program could emulate so as to show how they could translate their existing skill sets to C? (Or were you trying to show how much faster / better / more powerful / easier to use / etc. C was than That Other Language?) You could spend weeks writing long e-mail threads, having heated debates at meetings, hiring consultants to do user studies and market sizing activities and so on just trying to answer this simple question.

The average room full of monkeys could have banged out at least a couple dozen variations on "Hello world" in the time it took you to decide this.

So now you've figured out who the customer is and what scenarios you're trying to enable (print a simple message on the screen)... what next? OK, what should the exact text of the message be? How universally recognisable is it? Will it offend anyone of a different culture? How easy is it to localise it for other languages? Will it generate Product Support calls because people don't understand the message? How easy is it to test that the correct text was displayed once the program has run?

After some more meetings and e-mail threads, you settle on "Hello, world" as the text because you figure "Hello" is a pretty universally recognised greeting, and "world" is pretty inclusive of all peoples on the planet (hopefully no aliens will be running your program!). But now someone in the group brings up the issue of extensibility. Presumably you're writing this program so that other people can build on it, right? So shouldn't it include at least some form of customisability, or provide obvious entry points for further expansion (such as the snippet listed above)? What good is "Hello world" if all it can do is print out "Hello world" and there's no way for the customer to modify it to meet their critical business needs?

So now you re-visit your target customer / scenario decision for another week, just to make sure that ease-of-extension-to-solve-critical-business-needs is not one of your goals for this program. Now a month has passed, but at least everyone is on board with The Vision for the program -- it's just a program to print out "Hello, world" and is not the foundation for Microsoft Excel or SQL Server.

Meanwhile, the room full of monkeys has shipped "War and Peace."

In four different languages.

OK, you're going to display some text to the user, and the text is "Hello, world" -- but how do you get the text on the screen? (We'll pretend that GUIs and web browsers and so on haven't been invented yet, so there's no debate about which windowing API or which GUI framework to use, etc). Do you use printf or puts (or fputs or fputc in a loop or...)? Obviously printf is more powerful and lets the customer experiment more with the program, but that's not the goal of your program (see previous paragraph). The puts function is less powerful, but it will automatically put a newline at the end of the string and not confuse newcomers with the strange n syntax. (Oooooh, a new issue to track! Do we need a newline at the end of the string or not? Let's set up a meeting!)

You decide after some time that although the goal of the first program isn't to be immediately extensible by end-users, you do want to build on it in the book to introduce new concepts and so for that reason printf is the way to go. Having the n in there is a bit confusing, but it lets you talk about character escapes which the users need to figure out pretty darn soon anyway. Some people on the team have reservations about using printf (it will cause customer confusion and hence generate calls to PSS), but time is marching on and you have to ship something soon (the publisher keeps calling wanting to know how the book is coming along).

OK, you finally have the program code written, but for historical reasons the source file is named kosciusko.cpp (the code name of the project was "Mt. Kosciusko") and the legal department doesn't want you to ship it that way. You hastily get together with everyone on the team and decide re-name the file to hello.cpp, re-run all your tests, update all the documentation, etc. and a week later you're good to go.

By this time, the monkeys have formed their own advanced civilisation and invented JScript .NET, thereby rendering your new "C" language completely irrelevant.

And we didn't even go into the details of testing, localisation, globalisation, documentation, support, usability, accessibility, security, servicing, versioning, marketing, evangelism, training, and so on. The point is that it takes a very long time to design and build a brand new product that will be used by tens (or hundreds) of millions of people, many of whom have little or no knowledge of how computers work. Once the design is done and the first version has shipped, banging out the code to make a clone is relatively easy. It's also possible to build a better / faster / more feature-rich version, too, because you have a "known quantity" to work with. You can take "Hello, world" and add command-line arguments to it pretty easily, because all the hard work (like figuring out that printf was the right function to use) has already been done!

A great book that goes into this process at Microsoft in more detail is I Sing the Body Electronic, although it is now over ten years old. Showstopper! is another classic book about the making of Windows NT, although it too is quite old. A more recent book that shows some similar problems defining and building games at id Software is the entertaining Masters of Doom.