It’s calibration time at Microsoft. Time for managers to rank everyone in your peer group (same discipline, same career stage, same division) into five (and a half) ranges: the top 20 percent (and top 5 percent), the near top 20 percent, the middle 40 percent, the lower 13 percent, and the bottom 7 percent.
Calibration brings out the best in us—the best in our acidic, reproachful disdain. Engineers hate calibration because it’s not fair to great teams, in which everyone deserves high ratings, and because it discourages teamwork, since team members compete against each other for rewards. Managers hate calibration because it forces them to make hard choices, it punishes them for having a great team while rewarding their peers for having poor teams, and it creates uncomfortable conversations with their employees.
Well, I love calibration. That’s right, I love it! You weenies and whiners can go join some puritan, petite startup, while I count our billions and continue working with a top-notch staff. Hey, I’m huge on rewarding strong teams and teamwork. The fact that you think calibration discourages teamwork shows your ignorance. It’s time you got a clue.
Wisdom of crowds
There are teams and then there are divisions. They are not the same. A division has thousands of engineers. A team has between one and 12. You are calibrated against your peers in your division, not your team.
Yes, the few engineers on your team that are in your discipline career stage are among the hundreds in your calibration group. So what? That’s rounding error. You’re not competing against your teammates—you’re being compared across your division.
“Yeah, but my boss says every team has to fit the curve!” Typically, group managers like their entire teams to fit the percentages as a starting point. I used to misunderstand this, thinking it applied within career stage and would persist in calibration. Now that I’ve been through many calibrations, I realize it’s like the initial guess in a root-finding algorithm. You’ve got to start somewhere, but it’s rarely where you end up. By starting with teams roughly meeting the percentages, you at least cover the common cases quickly. Nonetheless, managers still talk about everyone in the calibration group.
Employees often complain that the HR text descriptions of each rating range don’t match the actual definitions—the percentages. True, but those definitions are a handy guide for managers to determine a starting point for calibration.
What are you trying to say?
A common concern is, “Instead of rating teams against their results, we’re rating engineers against each other! Doesn’t that discourage teamwork?” No, it doesn’t. Remember, you’re compared against all the engineers in your discipline and career stage across your division—not just the few on your team. If you and your teammates perform better than others in your division because you collaborate well, then you and your teammates will rank higher.
Don’t get me wrong—managers can certainly use calibration to create a competitive environment within their teams, making them dysfunctional. But managers can create competitive, dysfunctional environments any number of ways. I discussed this at length in “Beyond comparison” (Chapter 9).
Calibration doesn’t discourage teamwork. However, calibration does have a message for employees—Microsoft pays for performance.
If you are not performing as well as other engineers in your discipline at your career stage, then you will not be paid as well as your peers. If you perform better than other engineers in your discipline at your career stage, then you will be paid extra—sometimes a great deal extra. True, “perform better” is subjective, and any subjective system can be abused, but managers are calibrated too. In the end, Microsoft seeks to reward sustained excellence.
We must prepare
How does the actual calibration meeting work? How do managers decide who is “performing better”? It starts with preparation. HR provides a standard spreadsheet that has tabs for every employee and a main table to capture the calibration results. Because it’s a spreadsheet, metrics are automatically calculated to help groups understand the distribution of performance.
The tab for each employee in the spreadsheet is filled out in advance of the calibration meeting. HR puts in the employee’s past review results and basic information (such as the employee’s name, level, discipline, and date of last promotion). The employee’s manager adds:
- What the employee accomplished against and beyond his or her commitments.
- How the employee accomplished those results. (Did the employee make friends or enemies along the way?)
- Proven capability the employee has demonstrated in the past (for context).
- Feedback the employee has received from peers.
- A promotion indicator with explanation.
- The recommended choice out of the five ranges.
Naturally, these tabs are easier to fill out after employees have submitted their own assessments and after the manager has received peer feedback. That’s why everyone is encouraged to fill out assessments and feedback at least a few days before calibration. Yes, I know that doesn’t always happen. (That’s another reason to talk to your manager regularly.)
Getting to know you
The actual calibration meeting is usually six to eight hours long (no joke). Typically, names of employees are put on 3×5 cards or Post-it notes. For each career stage, the cards of employees in that career stage are placed in one of the five ranges as a starting point.
One range at a time (typically highest first), managers talk about every employee in that range. In addition to describing what, how, proven capability, and feedback for each employee, the managers talk about why they feel those aspects align with the selected range. Other managers typically ask questions, particularly if the reasoning doesn’t align with their own. In cases where employees clearly don’t fit the initial choice of range, they are moved up or down accordingly, regardless of percentages.
Once everyone in a range is discussed, the number of employees in the range is compared to the target percentage. If there are too few in the range, managers discuss who might move up as they review the next range. If there are too many in the range, managers go back and further question the employees that seemed to fit least solidly in the range. This all continues until every range is discussed and the calibration model is complete.
That’s a bit extreme
A naïve manager exclaims, “But I’ve worked hard to create a high-functioning team. They should all be in the top range!” Congratulations! Is there no room for improvement? I’ll bet there is.
Remember, you aren’t comparing teammates to each other—you are comparing employees to their peers across the division. Usually, you’ve got a mix of engineers that work well together. Reward each accordingly, and help every engineer become the best employee possible.
“But what about lame managers and lame teams? They all should suffer!” That’s harsh—calm down. But there are bad managers who hire bad people, yet present them as average or good people. These managers manipulate the system until they are caught.
Even the best of the worst managers get caught within a few years—usually faster. They are replaced or their teams are disbanded. No system is perfect, but ours does get things right, given time, and my experience is that the process has gotten better since Microsoft started focusing more on management excellence.
That’s not fair at all
The naïve manager laments, “But what do I tell a solid employee who was in the bottom 7 percent? He completed all his commitments.” Perhaps the commitments were too easy for his level—but what’s done is done. The employee is still in the bottom 7 percent. He is not getting a bonus, a raise, stock, or a promotion. Instead, he is getting a tough message about moving up or moving out.
“But he’s a solid employee. How is that fair?” You’ve got a solid employee who’s not nearly as good as other engineers in the same division, in the same discipline, at the same career stage. That means you can replace that employee and likely achieve better long-term performance. So, either your employee can substantially improve, or he can find another place to be successful. I describe this in detail in “The toughest job” (Chapter 9).
It’s important to have appropriate commitments for your career stage—achievable yet challenging enough to meet expectations for growth. Some divisions calibrate commitments within career stages by having managers review the commitments of their staff with their leads and with their peers. You can read more about writing great commitments in “I’m deeply committed.”
I’m still here
Yes, Microsoft compares employees against other employees in similar roles at the same career stage. Microsoft pays for individual performance. But why not pay for team performance instead, or at least in addition?
Personally, I’d like an element of my pay to be based on my team’s performance. Perhaps it will someday. However, I wouldn’t want all my pay to be team-based. I work for Microsoft, not my team. When I switch teams, I’m still working for Microsoft. My pay needs to be at least partly based on my own performance, not my associates’.
I also believe Microsoft could better recognize and reward the wide range of personality types and skill sets needed to create a high-functioning team. We need to find more ways to ensure that teamwork is recognized when it enables as well as when it produces.
Even with its imperfections, Microsoft’s system of pay for performance sustains the top-notch engineering workforce that I have the pleasure and privilege to collaborate with every day. That quality wouldn’t exist if calibration didn’t force us to have hard conversations and value the best among us. I love it.
The percentages ranges I mention at the start of the column are part of the new review model introduced this year. In my 16 years with the company, I’ve seen three review models:
- 2.5 – 4.5/A – C: The model when I started in 1995. A 2.5 meant you were done—up or out within 6 months. A 3.0 was a warning—you’re off your game. A 3.5 was standard goodness. A 4.0 was great, and a 4.5 was wow! The stock ratings (A best, C worst) were hidden and unknown unless you were a group manager. Large divisions had a discipline calibration. All organizations had a cross-discipline calibration (in large divisions it followed the discipline calibration). The approach varied by vice president, and your rewards were determined separately from your ratings. Reviews happened twice a year.
- Underperformed, Achieved, Exceeded/20-70-10: The model we’ve had since 2006. Theoretically, the commitment ratings were not subject to percentages, but in actual practice they did conform to rough percentages in order to differentiate bonus amounts within a fixed budget. The contribution rankings were transparent and directly corresponded to percentages—top 20 percent, middle 70 percent, and bottom 10 percent. All divisions had a calibration based on discipline and career stage. Some organizations followed with a second cross-discipline calibration by career stage. The approach varied by vice president, and your rewards were determined separately from your ratings. Reviews happened once a year.
- 1 – 5: The new system starting this summer. A 1 means the top 20 percent, 2 is the next 20 percent, 3 is the middle 40 percent, 4 is the lower 13 percent, and 5 is the bottom 7 percent. There is no separate stock/contribution ranking. All divisions have a calibration based on discipline and career stage. The approach is dictated by HR (though I imagine different vice presidents will introduce small variations), and your rewards are directly determined by your rating (with the very top 5 percent receiving extra). Reviews happen once a year.
Personally, I like the new system. We’ll have to see how it works out, but initially I like the simplicity of a single rating, that it’s on a curve that better matches historic percentages, that there’s no cross-discipline calibration (made no sense), and that there is a the standard approach across divisions. And I am overjoyed with the direct, fixed, and transparent tying of rewards to ratings.
I can think of three more improvements beyond the ones I mention in the column.
- Have six ratings (a separate one for the top 5 percent). With everything else so transparent, why hide the top 5 percent?
- Get credit toward your bottom 7 percent if you worked with HR to dismiss or implement a career change for struggling employees during the past review cycle. You can’t claim people who simply transferred or left—HR has to have been actively engaged regarding these folks who would have otherwise gotten a 5 rating. This would encourage managers to connect with HR and actively manage performance issues.
- Do calibration twice a year, and share the midyear calibration rating (perhaps tied to a semi-annual bonus, as it used to be, or used just as a tracking number). As a manager, I hate that I can’t provide my employees an unambiguous message about their calibrated performance at midyear. I can tell people they are in jeopardy, but psychologically that’s not the same as giving them a number.