Fixing five fundamental flaws

 After decades as a professional software engineer, working for six different firms (large and small), I can honestly say that Microsoft is by far the best. I can also honestly say that Microsoft is far from perfect.

My monthly rants typically focus on problems that individual engineers or managers can change by being better individual engineers and managers, by using different approaches or tools, or by altering the way they think about issues. However, Microsoft also has system-wide issues. I know how to solve them, but Microsoft executives and engineers may not like my solutions.

Too bad. I’m throwing my ideas out there this month. Don’t like them? Come up with something better. No company is perfect, but we owe it to ourselves to never be satisfied with the status quo.

Eric Aside

All opinions expressed in this column (and every Hard Code column) are my own and do not represent Microsoft in any official or unofficial capacity.

There is something terribly wrong

What’s wrong with Microsoft? I’ve narrowed it down to five fundamental flaws.

  1. We’re top-heavy. We’ve up-leveled lead and group manager roles. Nearly 80 percent of development leads have been at Microsoft for six years or more, with roughly 50 percent here for 10 years or more. This up-leveling clogs career advancement, reduces the influx of new thinking, and drives the use of outdated practices. Up-leveling was supposed to deliver better and fewer managers, but opportunities for growth and innovation need to balance that goal.
  2. We’re overstaffed and overfunded. Having lots of engineers and money enables Microsoft to accomplish big, bold breakthroughs. It also enables Microsoft to isolate divisions, duplicate infrastructure and services, fail slowly, and generally throw money and people at problems instead of thinking and simplifying.
  3. Our reward system fails. The time we spend debating and marginally improving the review system is only exceeded by the time we spend in calibration, assessment, and the rest of the review process. Even so, we insult and drive away people we value, and we don’t promote people promptly when they are demonstrably ready.
  4. We disregard previous experience. Microsoft culture places little value on what you’ve done before—the value is in what you’ll do next. That’s really nice in theory, but in practice we ignore skills developed previously, don’t build upon gained knowledge, and repeat past mistakes. This destroys industry hires, slows innovation, stifles reuse, lowers productivity due to constant restarts, and demoralizes the engineering staff.
  5. We replicate infrastructure. Divisions, and often individual teams, reinvent their own build systems, deployment systems, test systems, localization procedures, monitoring, and engineering analytics. We end up with wasted effort, poor systems, and difficulty working across teams that use different tools and methods.

A great Microsoft cultural edict is, “No whining. Accept it and move on, or come up with a better solution.” Let’s talk about why these five issues exist and what can be done about them.

Why? Why?

Being top-heavy is partially a result of being a successful, mature company. People move up, plateau eventually, and stay put. The company isn’t doubling its staff anymore, so the percentage of new blood declines. To fix this, we must grow much faster or make room somehow (org design or attrition).

Being overstaffed and overfunded is a direct result of success. We could trim, but that’s painful. We could change, but that’s hard and scary. The solution is to trim and change anyway.

Our reward system pays for individual performance based on a curve, and not everyone will be happy. The steady performers have trouble being promoted because they don’t stand out. Those who stumble get a disproportionate punishment. Even those who do a great job keeping up with their peers feel jilted by an average review. Only those who receive a well-deserved promotion (or expulsion) feel a sense of fairness. The solution is staring at us—it’s so obvious that it’s easy to dismiss (more below).

We disregard previous experience because, at first, it wasn’t important. All that mattered in Microsoft’s early days was “Can you code all day and night, loving every minute of it?” Now history matters. We’re in an established industry, with legacy codebases, huge projects, and billions on the line. We should still focus on the future, but we should also place people in roles where their past experience is most beneficial.

We replicate infrastructure when we don’t have a supported solution to turn to that operates well at the large scale of Microsoft projects. We do share Source Depot, Active Directory, Product Studio, SharePoint, Office, Exchange, and Windows Server. These are commercial products we build and use ourselves or internal products (not tools—products) that were designed and intended to be shared (though internal products don’t get as much love). Build systems, deployment systems, test systems, localization procedures, monitoring, and engineering analytics will need to be well-supported and scalable products if we hope to share them.

Eric Aside

Source Depot is our internal source control system. Product Studio is our internal bug tracking system.

You got a better idea?

What should we do? Let’s start with people issues. Please keep in mind, I speak for no one but myself.

At the start of every major product cycle, like a new major revision of Office, high-level planning already focuses on direction, key scenarios, and tenets. Part of that planning should also specify the number and kind of engineering triads needed based on what the high-level plan specifically requires—not based on current staffing.

There aren’t that many types of triads in engineering: UI (heavy on designers, light on architects), kernel (heavy on architects, light on designers), and midlayer (average number of architects, some API design). These triads come in small, medium, and large sizes. Thus, staff planning comes down to selecting the number and size of each type of triad necessary to build the product planned.

To break down each triad type, we can use statistical values from our people data to determine the small, medium, and large makeup (how many in each discipline and level band). Microsoft is big enough to give us a valid sample size. These breakdowns should be reviewed as a sanity check and also to balance growth and new hires.

Eric Aside

I did this triad breakdown exercise six years ago. It only took a day or two, and the data wasn’t that surprising or diverse (most triads of the same kind were roughly the same size). The biggest variances were in sustained engineering teams, due to basic differences in approach.

I’m focused on engineering product group triads in this column, but the same staff planning approach can be used in sales, finance, HR, consulting, operations, and anywhere else in a large, mature company like ours.

I’ve got a job to do

The next step is to place people into the key leadership roles, both individual contributors and managers. New triads need leaders that are good at building teams. People and architectural turnaround cases require leaders who’ve revamped teams before. There may be special technology requirements too.

To find the right leaders for these roles, midyear career discussion can include identifying skillsets (validated by managers) that are later used to place people. These key leaders are told why they were chosen to take their specific roles—making it clear that their special skillsets are valued and needed.

Once the key leaders have been placed, the teams can be built from existing and new staff. While many folks will continue to fit well in their current areas, no job is a given—everyone has a chance to change positions or look elsewhere. Internal candidates can’t fill entry-level-hire spots. Higher-level people can’t fill lower-level roles. However, people can get promoted into positions if they are ready. Those who can’t find a role that fits are given a few months to find a position elsewhere at the company or accept a severance package.

A variation of the solution I describe above has been used in large part by Office and Windows for years. I’ve added recording skillsets during midyear career discussion, emphasizing people’s past experience, preventing higher-level people from filling lower-level roles (avoids job inflation), and enforcing layoffs of extra people. My plan is harsh, but it ensures that we get balanced, smaller, and more efficient based on the people we need in order to build our software.

Eric Aside

You might be worried about losing high-potential employees during this process. Chances are good that they’d find positions during org shuffles, but we could use HR’s existing tracking of high-potential employees to catch any unfortunate exceptions. Those exceptions could be put on special projects until new positions open elsewhere at the company.

You also might be worried about staffing shifts to handle unforeseen issues or opportunities. Those plan changes happen all the time today. The purpose of high-level planning isn’t to create a perfect, immutable plan; it’s to think ahead about what you really want to achieve and how you’d like to achieve it, including how to best staff the project.

Just rewards

As for the review curve, we kill it. Period. Microsoft has slathered lipstick on our pig of a review model three different times (adding lip gloss tweaks annually). It doesn’t work—the review curve is still a pig.

Instead, we should replace the annual review with a second career discussion. Outrageous and unacceptable, right? Wrong. The three states of employee performance—doing well, moving up, and moving out—can all be addressed in a career discussion:

  • We already focus on moving out poor performers. Career discussions and one-on-ones with managers let people know in advance when they are falling behind. While people can be (and are) dismissed at any time, retaining only the employees we need to build each major revision of our software keeps everyone honest about who is meeting or exceeding expectations.
  • Today it takes too long to move people up, and we lose many as a result. Instead, we can focus semiannual career discussions and calibrations on promotions (and problems). We can design our triads to have an appropriate number of positions at each level, and we can reject candidates above the target level to prevent job inflation. We can make room for promotions at the desired rate and promote proven people promptly. That’s what they want—it’s the ultimate pay for performance.
  • Everyone else is performing well, working in roles we determined are essential for shipping our products and services. They all get paid the same generous salaries, taking into account their level and current market compensation. We can even give bonuses based on division results, encouraging collaboration and focusing on collective success.

In other words, we make the review system about growth, effectively filling essential roles, and promotions. A forced, abstract, insulting curve has no place in the system.

Eat it. EAT IT. Eat it.

Fixing the final fundamental flaw of Microsoft, replicating infrastructure, is just as obvious as the review system fix. We need our internal build systems, deployment systems, test systems, localization procedures, monitoring, and engineering analytics to be supported and scalable products (preferably commercial products that get proper attention).

There are a few different ways to make our internal systems commercial products.

  • We could take our existing internal systems and productize them. This strategy isn’t likely to work because the systems weren’t designed for commercial use. They are effective, but often fragile, difficult to maintain, poorly documented, and tightly coupled to individual division business practices. Plus, every division has its own.
  • We could use existing commercial products (including open source) to replace our internal systems. This strategy isn’t likely to work either. Microsoft produces many of the largest and most complex software products in the world. Handling that scale and complexity is beyond the capacity of commercially available engineering systems that cover the full software lifecycle.
  • We could exclusively use our existing commercial products—Azure, Visual Studio, and Team Foundation Server—for our own infrastructure. At Microsoft, we call using our own products “eating our own dog food,” or “dogfood” for short. We dogfood all our products except build systems, deployment systems, test systems, localization procedures, monitoring, and engineering analytics. Clearly, it’s time we start dogfooding these as well.

Many Microsoft teams do use Azure, Visual Studio, and Team Foundation Server for their infrastructure, but not exclusively. The build, deployment, test, localization, monitoring, and engineering analytics in these products are still relatively young. It would be painful to switch to them before they fully mature, and most of our customers don’t operate at Microsoft’s scale.

But it was painful to switch to Exchange. It was taxing to switch to Windows Server, SharePoint, Outlook, and Active Directory. It’s always unpleasant—that’s why we call it dogfood. However, going through that pain has led to successful products that scale to meet the demands of any enterprise. Scaling Visual Studio and Team Foundation Server to build Windows, or scaling Azure to deploy, monitor, and operate Bing, won’t happen overnight. However, right now that’s not even our stated goal. It’s time Microsoft made the commitment to dogfood Azure, Visual Studio, and Team Foundation Server.

Eric Aside

Wonder what it would take potentially to re-architect Azure, Visual Studio, and Team Foundation Server so that they scale to the largest workloads? Look no further than the dozens of full-scale teams that design, develop, operate, and maintain each division’s solutions today. Don’t think it’s worth it? Look again.

Tough love

I love Microsoft, but we’re getting top-heavy, overstaffed, and inefficient. Our review system is broken, past experience is disregarded, and money and effort are wasted on incongruent infrastructure.

We can fix these flaws. It’s not that complicated, and it’s not that radical. We can switch from a seemingly arbitrary and insulting curve to a thoughtful people plan. We can switch from division-specific “not invented here” to our own proudly invented and used superstructure.

Don’t like my solutions? Write to me and suggest something better. Wish my ideas were implemented? Use your network within Microsoft to spread them around. Transformation starts with people like us.

It will take time. It will be painful. It will be difficult. But when has Microsoft shied away from a challenge? We have a chance to be the first technology company to remain a market leader from the birth of an industry through to its full maturity. Ford couldn’t do it. Phillips and RCA couldn’t do it. IBM couldn’t do it. Apple is getting there. Microsoft can do it. Let’s beat the odds and make changes that will keep us on top for generations to come.

0313 Fixing five fundamental flaws.wma

Comments (7)

  1. I.M. Confused says:

    Eric, I'm a bit confused. I may be interpreting things incorrectly.

    In a previous blog post of…/out-of-calibration.aspx

    You say the following:

    "Personally, I like the new system. We’ll have to see how it works out, but initially I like the simplicity of a single rating, that it’s on a curve that better matches historic percentages"

    However in this post you say the following:

    "As for the review curve, we kill it. Period. Microsoft has slathered lipstick on our pig of a review model three different times (adding lip gloss tweaks annually). It doesn’t work—the review curve is still a pig."

    In one you are in support of the curve and the review process, yet in this one, you don't like it.

    Is it just a matter of you changing your mind? Or is it something more complicated I am missing?

    Thank you for the great blog posts.

  2. MSIT says:

    I may be very wrong, having only 6 months of experience in the industry, but isnt MSIT already adopting some of the practices mentioned here (a functional rewards system, using our own products such as TFS instead of PS and SD. I believe many teams migrated 5-6 years ago, or even further ago, the talent hub model which recognizes past talent,etc)

  3. Eric Brechner says:

    To I.M. Confused, I still like the improvements made to our review system. It's simpler and more transparent than it was before. If we must use a curve and give every employee a grade on that curve, the current review system serves as well as I’ve experienced.

    However, the overall rewards system does fail us in many ways, as I detail in this month's post. No improvements to it will change that basic dynamic. I.M. Wright states that a bit more colorfully in the column, but that’s his style.

    This month was about examining a broad alternative that focuses our staff and its growth as a means to ship new releases of our software. We staff projects on the experience we need to ship the innovation we seek. We differentiate pay based on promotions and assignments instead of ratings on a curve. That alternative requires more frequent promotions, but I do feel we should have more frequent promotions.

  4. David Roh says:

    You have not even touched on the real Microsoft issues:

    – Total disregard for the care and feeding of developers trying to build commercial products with Microsoft tools and products.

    – Microsoft creates a great new technologies, books are written, hours are spent by developers learning how to use it, then Microsoft abandons the technology – Silverlight, WCF RIA Services, etc.

    – Microsoft has started trying to force it's users to do things Microsoft's way removing choice from the user in both Microsoft commercial products such as Windows 8, and Microsoft tools such as Visual Studio 2012

    – Microsoft has stopped providing any kind of roadmap for it's developer community

    And that just the very top of the issues – never mind that Microsoft's research and development actives is almost non-existent now.

    Sinofsky is gone but his legacy lingers.

  5. ElCroc says:

    Interesting read. Though I too believe that David Roh's post nails the problems with MS a lot closer than the the original post, but the post itself does shed some small light onto the reasons why.

    sumarising quotes due space constraints:

    " David Roh – You have not even touched on the real Microsoft issues:

    Total disregard for the care and feeding of dev's"

    …MS aren't alone, take a look at Embarcadero's forums on their new hobby version and the flap it's caused with their tech partners

    " MS creates new tech, books written, hours spent by devs learning it, then MS abandons it – Silverlight, WCF RIA Services, etc"

    … both "disregard" and "abandon" points are exactly how .NET made me feel as a seasoned and I'd say advanced dev using VB6. I haven't programmed much since except as hobbyist, and my salary now reflects this. Thanks.

    " Microsoft forces users to do things MS way removing choice from user in products such as Windows 8, and tools such as VS 2012

    – Microsoft has stopped providing roadmap for dev community "

    … the "we can delete stuff from your harddrive" clause in Win8's licence is enough to put me off ever buying this, but on the dev side, WinRT is irrelevant to a desktop flatform, and is slated to eventually replace the other technologies that dev's have invested time and expertise learning to use these tools.. SERIOUSLY?

    I lost faith after VB6 was killed off, still care enough to read this post and respond, but Win8 and WinRT is close to turning me off permanently. In other trades you learn to use a toolset to create what you have in your head. In Windows dev, the tools keep transmorphing – pick up a hammer to do something only to find it made of rubber, or has changed into a screwdriver, and you need to pick up the new .NET hammer that works upside down to do the job, plus have to spend a week relearning how the new hammer works.

    WinRT is set to do this all over again for dev's invested in previous techs, just like .NET did for VB6.  And I won't even begin to rant on the vast increase in dev tool and MSDN sub costs since the early 2000's. I know I can't afford it, and I can't be alone.

    I'd LOVE to know who decided to create Win8 and WinRT:- " Hey chaps, let's regress back to when dosshell was a full screen task switcher, and forget why windows was a success – lets make windows not use windows at all and become a FS task-switcher again – if we call it just windows nobody will notice, right? Better, lets force it so we can make business desktop owners use a giant tablet on their desks, with a mouse, and put some app store in the forefront with a we-can-delete-this-stuff-if-we-want clause even if you paid for it "

    Sorry for ranting, I know OP was trying to address internal issues but this is honestly my take on the problems facing MS at this time. I think internal changes, if any, that MS make need to take account of past problems, and the future direction of both the company, and more importantly the product.

    Now tell me what do you think is wrong with MS?

  6. Steve Gourley says:

    Drop everything you are doing and fix the Internet Security problem.  The operating system is responsible for managing access to the computer resources.  Obviously it is failing.  The fault may not be entirely Microsoft's problem, but they are part of it and are big enough to persuade others to make the necessary changes.

    I resent the continuous updates to fix security problems and the computer performance hit when it is scanning for virus's that don't exist.

    Microsoft' would be making a real contribution to computing if they did that.

  7. I.M.Serious says:

    IMO, the biggest problem with Microsoft is its lack of creativity and passion for solving 'real' problems. People are more focussed on chasing awesome looking/fancy/cool stuff and create lot of junk. Creation of junk is one of the fundamental reasons why Microsoft doesn't have a good brandname. Ok, Why is junk created at the first place? Since majority of people are smart, the only reason for this could be performance pressure, where people assume that they are supposed to create something new and change the world. People miss the 'why' part of it in the whole wild goose chase. That said, a viable solution could be to build a culture where people producing new innovative stuff and people making products better, faster and more customer friendly are equally rewarded. Instead of pushing third grade products to market, products should be ranked using programs involving both internal and external users, and only when they meet a certain bar, they should be released.