Feedback and Engineering Windows 7

Just about every email we receive and every comment we get comes with feedback—something to change, something to do more of, something to do less of, and so on. As we’ve talked about in this blog, acting on each one in an affirmative manner is easier said than done. What we can say for certain, is that we are listening to each and every comment, blog post, news story, MS Connect report, Send Feedback item, and of course all the data and telemetry.  This post kicks off the discussion of changes made to the product with an overview of the feedback process.  We'll get into specific changes shortly and we'll continue to return to the theme of changes in the Release Candidate (RC) over the next weeks.  Yesterday on the IE Blog, you saw that we'll be updating IE 8 on Windows 7, and there we also talked about the feedback process in general.

Feedback about Windows 7 of course starts before we've written any code, and by the time we've got running code thousands of people outside of Microsoft have provided input and influenced the feature set and design of Windows 7.  As we've seen, the input from even a small set of customers can often represent a wide variety of choices--often in alignment, but just as often in opposition.  As we're developing the features for Windows 7 we work closely with PC makers, enterprise customers, and all types of customers across small business, education, enthusiasts, product reviewers and industry "thought leaders", and so on.  We shape the overall "blueprint" of the release based on this wide variety of input.  As we have design prototypes or code running, we have much more targeted and specific feedback by using tools such as usability tests, concept tests, benchmark studies, and other techniques to validate the implementation of this blueprint. Our goal with this level of feedback is for it to be representative of the broad set of Windows customers, even if we don't have a 1:1 interaction with each and every customer.  Hopefully this post will offer some insights into this process overall--the tools and techniques, and the scope of feedback. 

In the first few weeks of the Windows 7 beta we had over one million people install and use Windows 7.  That's an astounding number for any beta test and while we know it has been fun for many folks, it has been a lot of work for us--but work that helps to raise the quality of Windows 7.  When you use the beta you are automatically enrolled in our Customer Experience Improvement Program (anonymous feedback and telemetry, which is voluntary and opt-in in the RTM release).  Just by using Windows 7 as a beta tester you are helping to improve the product--you are providing feedback that we are acting on in a systematic manner.  Here is a sense of the scale of feedback we are talking about:

  • During a peak week in January we were receiving one Send Feedback report every 15 seconds for an entire week, and to date we’ve received well over 500,000 of these reports.  That averages to over 500 reports for each and every developer to look through!  And we're only through 6 weeks of using the Windows 7 beta, even though for many Windows 7 already seems like an old friend.
  • To date, with the wide usage of the Windows 7 Beta we have received a hundreds of Connect (the MSDN/Technet enrolled beta customers) bug reports and have fixes in the pipeline for the highest percentage of those reported bugs than in any previous Windows development cycle.
  • To date, we have fixes in the pipeline for nearly 2,000 bugs in Windows code (not in third party drivers or applications) that caused crashes or hangs.  While many Beta customers have said they are very happy with the quality of Windows 7, we are working to make it even better by making sure we are fixing the issues experienced by such broad and significant usage.
  • To date, we have recorded over 10,000,000 device installations and over 75% of these were able to use drivers provided in box (that is no download necessary).  The remaining devices were almost all served by downloading drivers from Windows Update and by direct links to the manufacturer's web site.  We've recorded the usage of over 2.8M unique plug-and-play device identifiers.
  • On a personal note, I've received and answered almost 2,000 email messages from folks all around the world, just since this blog started in August.  I really appreciate the discussion we're having and am doing my best to keep up with all the mail.

We have a variety of tools we draw on to help inform the decision making process. A key element that we have focused on quite a bit in Windows 7 is the role of data in making decisions. Everything we do is a judgment call as ultimately product development is about deciding what to get done from an infinite set of possibilities, but the role of data is essential and is something that has become far more routine and critical. It is important to be super clear—data is not a substitute for good judgment or an excuse to make a decision one way or another, but it most definitely informs the decision. This is especially true in an era where the data is not only a survey or focus group, but often includes a “sampling” of millions of people using Windows over the course of an extended time period.

A quick story from years ago working on Office, many years ago before the development of telemetry and the internet deciding what features to put in a release of Office could really be best described as a battle. The battle took place in conference rooms where people would basically debate until one or more parties gave up from fatigue (mental or otherwise)—essentially adrenaline-based product development. The last person standing, the one with the most endurance, or the one who pulled an all-nighter to write the code pretty much determined how features ended up or what features ended up in a product. Sort of like turning feature design over to a Survivor- like process . I’m sure many of you are familiar with this sort of process. The challenges with this approach are numerous, but inevitably features do not hold together well (in terms of scenarios or architecture), the product lacks coherency, and most importantly unless you happen to have a good match between the “winner” and the target customers, features will often miss the mark.

In the early 1990’s we started instrumenting Word and learning about how people actually used the software (this was before the internet so this was a special version of the product we solicited volunteers to run and then we would collect the data via lots of floppies). We would compile data and learn about which features people used and how much people used them. We learned things such as how much more people used tables than we thought, but for things very different than tables. We learned that a very significant amount of time the first suggestion in the spelling dictionary was the right correction (hence autocorrect). We learned that no one ever read the tip of the day (“Don’t run with scissors”). This data enabled us to make real decisions about what to fix, the impact of changes, and then when looked at the goals (the resulting documents) what direction to take word processing.

Fast forward to the development of Windows 7 and we’re focused on using data to help inform decisions we make. This data takes many forms and helps in many ways. I know a lot of folks have questions about the data – is it representative, how does it help fix things people should be using but don’t, what about doing new things, and so on. Data is an important element of making decisions, but not a substitute for clear product goals, meaningful customer engagement, and working across the ecosystem to bring Windows 7 to customers.

Let’s talk a bit about “bugs”. Up front it is worth making sure we’re on the same page when we use the much overloaded term bug. For us a bug is any time the software does something that someone one wasn’t expecting it to do. A bug can be a cosmetic issue, a consistency issue, a crash, a hang, a failure to succeed, a confusing user experience, a compatibility issue, a missing feature, or any one of dozens of different ways that the software can behave in a way that isn’t expected. A bug for us is not an emotional term, but just shorthand for an entry in our database representing feedback on the product. Bugs can be reported by a human or by the various forms of telemetry built into Windows 7. This broad definition allows us to track and catalog everything experienced in the product and do so in a uniform manner.

Briefly, it is worth considering a few types of data that help to inform decisions as some examples.

  • Customer Experience Improvement Program. The CEIP covers the full set of data collected on your PC that is provided to Microsoft in an anonymous, private, and opt-in manner. During the beta, as we state, this is defaulted on. In the retail product of course this is optional. During the course of the beta we are seeing the data about usage of new features, where people are customizing the product, what commands are being used, and in general how is Windows 7 being used. You’ve seen us talk about some of this data from Windows Vista that informed the features of Windows 7, such as the display resolution being used or the number of accounts on a machine. There are many data points measured across the product. In fact, an important part of the development cycle is to make sure that new features are well instrumented to inform us of usage during beta and down the road.
  • Telemetry. While related to CEIP in the programmatic sense, we look at telemetry in a slightly different manner and you’ve seen this at work in how we talk about system performance or about the diversity of devices such as our discussion of high DPI support. Throughout the course of the beta we are able to see how boot time evolves or which devices are successfully installed or not. Important elements of telemetry that inform which bugs we fix are how frequently we are seeing a crash or a hang. We can identify software causing a higher level of issues and the right team or ISV can know to work on the issue. The telemetry really helps us focus on the benefit of the change—fixing a bug that represents thousands of customers, a widely used device, or broadly used third party software has a much bigger impact than a bug that only a few people, lower volume device, or less used software product might address. With this data we can more precisely evaluate benefit of changes.
  • Scenario based tests. During the course of developing a feature we can take our designs and prototypes (code, paper, or bitmaps) and create a structured study of how customers would interpret and value a feature/scenario. For example, early in the planning of Windows 7 we created a full working prototype of the taskbar enhancements. With this prototype we can study different types of customers (skill levels, familiarity with different versions of Windows, competitive product customers, IT pro or end-user) and how they react to well-defined series of “tasks”. This allows a much more detailed study of the feature, as one example. As with all tests, these are not a substitute for good judgment in broader context but a key element to inform decisions.
  • Benchmarking studies. As we transitioned to the pre-beta we started to have real code across the whole product so we began validation of Windows 7 with real code in real world scenarios. We call these studies benchmarking because often we are benchmarking the new product against a baseline of the previous version(s) of Windows. We might do a study where we see how long it takes to share a printer in the home and then compare that time to complete/success rate with a Windows 7 test using HomeGroup. We might compare setting up a wireless network with and without WPA. We have many of these types of benchmarks and work to make sure that we understand both the progress we’ve made and where we might need to improve documentation, tutorials, or other forms of assistance.

This type of feedback all represents structured feedback in that the data is collected based on a systematic study and usually has a hypothesis associated with it. We also have the unstructured feedback which represents the vast array of bug reports, comments, questions, and points of view expressed in blogs, newsgroups, and the Send Feedback button—these are unstructured because these are not collected in a systematic manner, but aggressively collected by any and all means. A special form of this input is the bug reporting done through the Connect program—the technical beta—which represents bug reports, feature suggestions, and comments from this set of participants.

The Windows 7 beta represents a new level of feedback in this regard in terms of the overall volume as we talked about above. If you go back and consider the size of the development team and the time it would take to just read the reports you can imagine just digesting (categorizing, understanding, flagging) issues let alone responding to them is a massive undertaking (about 40 Send Feedback reports per developer during that one week, though as you can imagine they are not evenly distributed across teams).

The challenge of how to incorporate all the feedback at this stage in the cycle is significant. It is emotional for us at Microsoft and the source of both considerable pride and also some consternation. We often say “no matter what happens, someone always said it would.” By that we mean, on any given issue you can be assured that all sides will be represented by passionate and informed views of how to resolve it, often in direct opposition to each other plus every view in the middle. That means for the vast majority of issues there is no right or wrong in an absolute sense, only a good decision within the context of a given situation. We see this quite a bit in the debates about how features should work—multiple solutions proposed and debate takes place in comments on a blog (people even do whole blogs about how things should work). But ultimately on the Windows development team we have to make a call as we’re seeing a lot of people are looking forward to us finishing Windows 7, which means we need to stop changing the product and ship it. We might not always make the right call and we’ll admit if we don’t make the right call, even if we find changing the behavior is not possible.

Making these decisions is the job of program management (PM). PMs don’t lock themselves in their offices and issue opinions, but more realistically they gather all the facts, data, points of view, and work to synthesize the best approach for a given situation. Program management’s role is making sure all the voices are heard, including beta testers, development, testing, sales, marketing, design, customer support, other teams, ISVs, IHVs, and on and on. Their role is to synthesize and represent these points of view systematically.

There are many factors that go into understanding a given choice:

  • What is it supposed to do? At the highest level, the first question to ask is about how is something supposed to work. Sometimes things are totally broken. We see this with many many beta issues around crashes and hangs for example. But there’s not a lot of debate over these since if it crashes in any meaningful frequency (based on telemetry) it should be fixed. We know if it crashes for you then it is a “must fix” but we are looking across the whole base of customers and understanding the frequency of a crash and also whether the code is in Windows, a driver from a hardware maker, or software from a third party—each of those has a different potential resolution path to consider. When it comes to user interaction there’s two elements of “supposed to do”. First, there’s the overall scenario goal and then there’s the feedback of how different people with different experiences (opinions) of what it should do. As an example, when we talked about HomeGroup and the password/passphrase there was a bunch of feedback over how this should work (an area we will be tweaking based in part on this feedback). We of course have specifications and prototypes, but we also have a fluidity to our development process such that we do not have 100% fidelity before we have the product working (akin to architectural blueprints that leave tons of decisions to be made by the general contractor or decided while construction is taking place). There are also always areas in the beta where the feature is complete but we are already on a path to “polish” the experience.
  • How big is the benefit? So say we decide something is supposed to behave differently. Will it be twice as good? Will it be 5% better? Will anyone notice? This is always a great discussion point. Of course people who advocate for a change always are convinced that the change will prevent the feature from being “brain dead” or “if you don’t change this then the feature is dead”. We see this a lot with areas around “discoverability” for example—people want to put something front and center as a way of fixing something. We also see many suggestions along the lines of “make it configurable”. Both of these have benefits in the near term of course, but both also add complexity down the road in terms of configurations, legacy user interface, and so on. Often it is important to look at the benefit in a broader context such as how frequently something will be executed by a given person or what percentage of customers will ultimately take advantage of the improvement. It is not uncommon internally to see folks extrapolate instantly to “everyone does this”!
  • How big is the change? Early in the product cycle we are making lots of changes to the code—adding new code, rearchitecting, and moving things around a lot. We don’t do so willy nilly of course but the reality is that early in the cycle there is time for us to manage through the process of substantially changed code and the associated regressions that will happen. We write specifications and have clear views of features (scenario plans, prototypes, and so on) because we know that as the project progresses the cost of making big changes of course goes up. The cost increases because there is less time, but also because big change late in the cycle to a large system is not prudent engineering. So as we consider changes we also have to consider how big a change is in order to understand the impact across the system. Sometimes change can be big in terms of lines of code, and lots of code is always risky. But more often the change is not the number of lines, but the number of places the code is connected—so while the change sounds like a simple “if” statement it is often more complex than that. Over the years, many have talked about componentization and other systems engineering ways to reduce the impact of change and of course Windows is very much a layered system. The reality is that even in a well layered system, it is unlikely one can change things at the bottom and expect no assumptions of behavior to carry forth through subsequent upper layers. This “defensiveness” is an attitude we have consistently throughout our development process because of the responsibility we feel to maintain compatibility, stability, performance, and reliability.
  • How costly is the change relative to the benefit? Change means something is different. So any time we change something it means people need to react. Often we are deliberate in change and we see this in user interface, driver models, and so on. When new are deliberate people can prepare and we can provide tools to help with a transition. We’ve seen a lot of comments about new features that react to the cost of change. Many times this commentary is independent of the benefit and just focuses on the change itself. This type of dialog makes it clear that change itself is not always good. With many bug reports we hear “this has been in Windows for 3 versions and must be fixed in Windows 7”. Over many releases of Windows we have learned that behaviors in the system, particularly in APIs, message order and semantics, or interfaces might not be ideal, but changing them introduces more complexity, incompatibilities, and problems for people than the benefit of the change. Some view these decisions as “holding us back” but more often than not it would be a break from the past one day only to create a new past to break from the next. The existing behavior, whether it is an API or a user interface, defines a contract we have and part of building a release is making sure we have a well understood cost/benefit view, knowing that as with any aspect of the system different people will have different views of this “equation”.
  • In the context of the whole release, how important is this issue? There is the reality that all decisions need to be made in the context of the broader goals of the release. Each release stands for a set of core scenarios and principles that define the release. By definition it means that each release some things will change more than others and some things might not change at all. Or said another way, some parts of the system will be actively worked on towards a set of goals while we keep other parts of the system more or less “stable” release over release. It means that things you might want to see changed might not change, just because that is an area of the product we’re not mucking with during Windows 7. As we’ve talked about, for Windows 7 we put a lot of work into various elements of system performance. Aside from the obvious scenario planning and measurement, we also took very seriously areas of the system that needed to change to move us forward. Likewise, areas of the system where the performance gain would not be significant enough to warrant change do not change that much. We carry this forward through the whole cycle as we receive data and telemetry.
  • How does the change impact security, reliability, performance, compatibility, localizability, accessibility, programmability, manageability, customizability, and so on? The list of “abilities” that it takes to deliver windows is rather significant. Members of our development team receive ongoing training and information on delivering on all of these abilities so we do a great job across the product. In addition, for many of these abilities we have members of the team dedicated full time to delivering on them and making sure across the product we do a good job. Balancing any change or input against all of these abilities is itself a significant undertaking and an important part of the research. Often we see input that is very focused on one ability which goes counter to another—it is easy to make a change to provide customization for example, but then this change must also be customizable for administrators, end-users, and PC makers. Such complexity is inherent in the very different scenarios for usage, deployment, and management of PCs. The biggest area folks see us considering this type of impact is when it comes to changing behavior that “has been in the product forever”. Sometimes an arbitrary decision made a while back is best left as is in order to maintain the characteristics of the subsystem. We know that replacing one old choice with a new implementation just resets the clock on things that folks would like to see be different—because needs change, perspectives change, and people change.

These are just a few of the factors that go into considering a product change. As you can see, this is not something that we take lightly and a lot goes into each and every change. We consider all the inputs we have and consider all the data we can gather. In some ways it is easy to freeze thinking about the decisions we must make to release Windows 7—if you think too hard about a decision because you might start to worry about a billion people relying on something and it gets very tricky. So we use data to keep ourselves objective and to keep the decision process informed and repeatable. We are always humbled by the responsibility we have.

While writing this post, I received a “bug report” email with the explicit statement “is Microsoft going to side step this issue despite the magnitude of the problem” along with the inevitable “Microsoft never listens to feedback”. Receiving mail like this is tough—we’re in the doghouse before we even start. The sender has decided that this report is symbolic of Microsoft’s inability or lack of desire to incorporate critical feedback and to fix must fix bugs during development. Microsoft is too focused on shipping to do the right thing. I feel like I’m stuck because the only answer being looked for is the fix and anything less is a problem or further proof of our failure. And in the back of my mind is the reality that this is just one person with one issue I just happen to be talking to in email. There over a couple of million people using the beta and if each one, or for that matter just one out of 10, have some unique change, bug fix, or must do work item we would have literally years of work just to make our way through that list. And if you think about the numbers and consider that we might easily get 1,000,000 submitted new “work items” for a product cycle, even if we do 100,000 of them it means we have 900,000 folks who feel we don’t listen compared to the 100,000 folks who feel listened to. Perhaps that puts the challenge in context.

With this post we tried to look at some of the ways we think about the feedback we’re getting and how we evaluate feedback in the course of developing Windows 7. No area is more complex than balancing the needs (and desires) of such a large and diverse population—end-users, developers, IT professionals, hardware makers, PC manufacturers, silicon partners, software vendors, PC enthusiasts, sysadmins, and so on. A key reason we augment our approach with data and studies that deliberately select for representative groups of “users” is that it is important to avoid “tyranny of the majority” or “rule by the crowd”. In a sense, the lesson we learned from adrenaline -based development was that being systematic, representative, and as scientific as possible in the use of data.

The work of acting on feedback responsibly and managing the development of Windows through all phases of the process is something we are very sincere about. Internally we’ve talked a lot about being a learning organization and how we’re always learning how to do a better job, improve the work we do, and in the process work to make Windows even better. We take this approach as individuals and how we view building Windows. We know we will continue to have tough choices to make as everyone who builds products understands and what you have is our commitment to continue to use all the tools available to make sure we are building the best Windows 7 we can build.

--Steven