It starts with shipping


Call me “old school” but I believe in shipping. Trying isn’t enough. Getting close isn’t enough. Good ideas aren’t enough. You’ve got to ship.


It used to be that interviews started with, “What have you shipped?” If you hadn’t shipped recently, “Why?” Why? Because you can’t deliver customer value if you don’t deliver. You can’t iterate and improve without finishing an iteration. You can’t get customer feedback without customers.


People used to complain that promotions and rewards were disproportionally distributed to those who shipped. I say, “Absolutely, that’s how it should be.” Does this hurt quality? No, you set a high minimum quality bar and ship. Does it hurt innovation? No, innovators have always risked an initial drop in pay to receive a big payoff should they deliver.



Eric Aside


Some people complain that the big payoff doesn’t exist at Microsoft for innovative ideas. Those people haven’t shipped. The people who successfully ship innovative ideas are the ones who become our organizational and technical leaders.


It all starts with shipping. This is particularly apt with services, where everything literally starts with shipping, and where I’m focusing the rest of this column. Our critics claim that in the new world of services Microsoft has forgotten how to ship. Perhaps, but Microsoft has forgotten more about shipping than most companies will ever know. We just need some reminders and reeducation, especially when it comes to services.



Eric Aside


Does a focus on shipping drive death marches? No, death marches delay shipping. As I wrote in “Marching to death” (chapter 1), death marches result from a lack of planning and courage. This is particularly important to understand in the services world where sustainable shipping is critical to long term success.


I offer you my service


How much about shipping services has Microsoft forgotten or doesn’t get, according to critics? Not as much as they would have you believe, but enough to make you think. Let’s go over the herrings and the heartaches, mixed with a little happiness.


The red herrings:


§  Services make you think about everything differently.


§  Services center on data while packaged products center on functionality.


§  Services have greater security concerns than packaged products.


§  Services have serious issues with dependencies.


§  Services demand higher quality and faster iterations than packaged products.


The heartaches (and happiness):


§  Services run across hundreds of machines, not on a single client.


§  Services must scale out automatically.


§  Services are easier to switch than packaged products.


§  Service upgrades hit everyone instantly.


§  Services are living, changing things.


Let’s break these down, starting with the red herrings.


What is that smell?


The first services red herring is a big one, “Services change everything.” As I addressed in At your service, this is total bovine fertilizer. Services start and end with helping customers achieve their goals, just like all products ever. You focus on the customer experience and what they hope to accomplish or you lose. End of story.


The next three red herrings—centering on data, security concerns, and dependency issues—all apply just as well to shipping packaged products, though it may have taken us longer to realize it. You can’t expose data format changes to customers without chasing them away, on the client or the server. There isn’t a computer product or service today that isn’t vulnerable to attack—you must secure them all. Finally, if you think external dependencies aren’t problematic on the client, you clearly don’t use many drivers. I’m not saying these aren’t real issues—I’m saying they aren’t new or specific to services.


The last red herring is among the most common concerns raised about why shipping services differs from shipping packaged products—high availability and Internet time. Look, it’s not okay for packaged products to never work or require a reboot every time you use them; at least it hasn’t been for quite some time. The quality bar is no different for services, though there are plenty of services that fail constantly.


As for Internet time, that hit packaged products a decade ago with the introduction of Windows Update. And if you think that those patches are just security fixes, you haven’t been paying attention. More and more we are fixing all kinds of experience issues shortly after customers report them, for services and packaged products. That’s a great thing for customers.


However, gradually improving the customer experience every month or every day isn’t enough. Both services and packaged products need to ship significant, orchestrated updates to deliver breakthrough customer value. Facebook wasn’t going to gradually update itself into Twitter any more than Vista would gradually update itself into Windows 7. You must focus on what the customer is trying to accomplish, and sometimes that isn’t a quick change.



Eric Aside


The best way to learn how to ship is to do it early and often. Make every build a shippable build. Build every day and rebuild the entire system at least every week. Deploy regular tech previews and betas. Deploy regular incremental updates and fixes into production. Ship early, ship often. Practice makes perfect.


There are too many of them


However, not everything about shipping packaged products applies to shipping services. There are mental, process, and team adjustments that you need to make.


First and foremost is that services run across hundreds or thousands of machines dispersed in multiple data centers worldwide. Sometimes functionality and data are replicated. Sometimes functionality and data are specialized. Usually, it’s a combination of both for scale and reliability. Naturally, this presents design and synchronization problems but plenty of books have been written about that (read don’t rediscover). The less obvious challenges are around debugging and deployment.


Why is debugging a service so tough? Timing issues are killer given multiple threads on multiple processors across multiple machines. Yikes! However, that’s not even the toughest challenge.


What’s the first thing you do when debugging an issue? Analyze the stack, right? With services the stack is split across servers and requests, making it nearly impossible to trace a specific user action. The good news is that there are new tools that help tie user actions together across machines. The bad news is that this isn’t the toughest challenge either. The toughest challenge is that you’re always debugging in the live environment. You don’t get symbols, breakpoints, or the ability to step through code.


So let’s recap. Debugging services means debugging nasty timing issues across multiple machines with no stack, symbols, or breakpoints on live code. There’s only one solution—instrumentation—and lots of it, designed in from the beginning, knowing you’ll soon be debugging across live machines with no stack, symbols, or breakpoints.


They’re multiplying too rapidly!


Solving debugging brings us to the other huge challenge—deployment. Deployment needs to be completely automated and lightning fast. We’re talking file copy installation, with fast file copy. No registry, no custom actions, and no manual anything.


Why does deployment need to be so fast and simple? Two reasons:


§  You’re installing onto hundreds or thousands of machines worldwide while they are live. Installation must work and work fast with zero human intervention ever. The slightest bit of complexity will cause failures. Remember, five minutes times 1,000 machines equals three-and-a-half days. It had better just work.


§  The number of servers needs to grow and shrink dynamically based on load. Otherwise, you are wasting hardware, power, cooling, and bandwidth in order to meet the highest demand. Because your scale depends on load, it can change any time. When it changes, you need to build out more systems automatically and instantly.


The happiness around deployment is that Azure will do most of the heavy lifting for you (so let it, don’t reinvent). However, you still need to design your services to support file copy installation.


Life is so uncertain


Enough of the challenges you can predict, how about the unpredictable ones? The services landscape is in constant change. While some services are sticky because they hold your data (like Facebook or eBay), many aren’t sticky at all (like search or news). A few minutes of downtime can cost you thousands of customers. Data compromise or loss can cost you millions of customers. They’ll just switch. Our competitors will be happy to accept them. It cuts both ways so you need work hard to both welcome and keep new users.


When you update a service everyone gets the new version instantly, not over years. If there’s a bug that only one customer in a thousand experiences, then that bug will hit thousands of customers instantly (law of truly large numbers). That means you need to resolve the issue quickly or rollback. Either way, it’s a bad idea to update a service on a Friday and a good idea to have an emergency rollback button always at the ready.


Finally, it’s important to realize that services are living, changing things. You’d think that because the servers are all yours with your image and your configuration that it would be a controlled environment—and it is until you turn on the switch. Once the server goes live, it changes. The memory usage changes, the data and layout on the disks change, the network traffic changes, and the load on the system changes. Services are like rivers not rocks. You can’t ship and forget services. They need constant attention. To make your life easier, bake resilience in by automating the five Rs—retry, restart, reboot, reimage, and replace (though replace may require human hands at some point).


The happiness that comes with these heartaches are customers willing to switch; an ideal idea testing platform because you can show customers different ideas and see which they prefer on a daily basis; and the ability to ship now and find the tricky intermittent Heisenbugs later (using your five Rs resilience to keep up availability).


Back to basics


There you have it. Some food for thought mixed in with the old basics of writing solid code that focuses on customers and their goals.


However, none of this is worth anything without shipping. Make shipping a priority and we all win. Sure, the quality bar has gone up, but we’re not kids selling lemonade anymore. We need to ship quality experiences regularly, on both long and short time scales. We need to ship on the Internet, on the PC, and on the phone. We need to serve our customers well and delight them into sticking with us. It’s a long journey, but it doesn’t start until we ship.

Comments (4)

  1. Anonymous says:

    I’ve worked at five other companies before Microsoft, and they all knew how to ship.  Microsoft is the only place I’ve worked that can’t manage to get bits out to customers.

    Nine re-orgs in two-and-a-half years.  Nine!  Just yesterday politics and insufficient resources killed the project my coworkers were on.  They were at code complete before it was decided that there wouldn’t be enough test resources and that the coolness of the feature might embarrass another product team.  People are angry, sure, but nobody is surprised.  It happens with such regularity that only an intern would raise an eyebrow.

    The only Ship It award I’ve ever received here is for a product I didn’t work on.  In our group, we’re giong to start handing out Scrap It awards.

    The constant randomization from five to six levels up the hierarchy make shipping virtually impossible.  Re-orgs, strategy changes, and political turf battles obsolete our work with more regularity than Moore’s Law.  Mix in inconsistent guidance and frequent decision reversals from the legal, privacy, and security groups, and you’ll be extremely lucky to get something out the door, let alone have a chance to iterate on it.

    Shipping early and often is generally good.  (The oft-overlooked caveat is that every release creates legacy that can slow future development.)  In my 20-year career, Microsoft is the only place I’ve worked that cannot execute a plan to completion.

  2. ericgu says:

    I agree with most of what’s here, but I’d like to comment on one sentence:

    "Microsoft has forgotten more about shipping than most companies will ever know."

    Not only does this sound quite arrogant – and embody the worst of Microsoft rather than the best – I’m curious how you came up with your data.

    My observation is that there are lots of companies that manage to ship high-quality software (either client or services) on a regular basis. The "we’ve forgotten more than you know about development" attitude is a prime reason that Microsoft has a lot of these issues.

  3. It’s suddenly stormy again here in Seattle, and I see I. M. is keeping his pate warm. I. M.’s new column,