One of the many ways we test designs with real people is through usability testing. Although in Office 2007 we’ve greatly expanded the range, scope, and types of testing we’ve done to include everything from remote testing to extremely early deployments to longitudinal studies, we still do our share of standard usability tests.
What is a standard usability test? Well, normally it works like this: a test subject drives to Microsoft and comes into our labs. The usability engineer responsible for the test greets them, answers any questions they might have, and then sits them in front of a computer which is running the software or prototype being evaluated. The usability engineer runs the test from an adjoining room, while the designers and program managers responsible for the feature being tested either watch in person or from their offices via a video feed.
In many tests, the test subject is given a set of tasks to complete, and asked to verbalize their thoughts as they go: what their expectations are, what they’re looking for, if they’re getting frustrated… things like that. Other times, people are given more open-ended tasks, such as “make a document exactly like this printout” or even just “make a nice looking resume.” Sometimes we have people bring their own work to Microsoft and they complete it in our testing environment.
Most times, the usability subject has filled out a screener ahead of time which helps us judge how much of an expert the subject is at using the software being evaluated. The point is not to exclude anyone, but to help us analyze the results–we do test everyone from ultra-novices to super-elite power users.
When a test is done (usually between one and two hours later), the subject is given a software gratuity as thanks for donating their time, and the cycle of improving the design begins anew.
Back when I first was exposed to usability early in my Microsoft career, my expectation was that people were really going to be super-critical. After all, the software is usually in a pretty rough state during the tests and this was people’s one chance to really let Microsoft have it and let out their rage at things not working as they expected them to.
But it turns out that this impulse is generally wrong. In fact, people tend to be much less critical of the software designs they’re testing than they probably should be.
I think of this as a form of “Stockholm Syndrome,” in which a hostage becomes sympathetic to his captors.
Number one, people are coming to our labs, as a guest of Microsoft. There’s a little piece of human nature which says you don’t go to someone’s house and then insult them. They come to our place, we’re giving them free stuff–no wonder they subconsciously want to please us a little.
Secondly, people have an innate tendency to blame themselves when they can’t complete a task, instead of blaming the software. You hear a lot of “oh, I’m sure this is easy” and “I’m so embarrassed I can’t figure this out.”
Maybe this comes from taking tests when you’re in school, knowing that every question has a correct answer and if you get one wrong, it’s your fault. Maybe computers are still so complex that people feel like they should have to undergo training in order to use them correctly, and failing a task in usability plays on that insecurity.
Whatever the cause, this tendency to not criticize the software is a major risk to the results of standard usability testing. Our usability engineers are well aware of this and take great pains to ask test subjects to be critical and reassuring them that “it’s not a test of you, it’s a test of the software.”
But there’s always the potential of skewed results, and this is one of the reasons we’ve supplemented standard testing by initiatives in which we watch the software more in the real world–including technology to perform tests remotely on people’s home computers, far away from the bright lights and cognitive din of our on-campus labs.
Interested in participating in usability research as a participant? Visit http://www.microsoft.com/usability.