So what happened?

Now that we have had time to dive into the details more on what happened that caused the temporary problems last weekend, I want to share what we found so far. We still have more work to do, but we now understand what occurred and why. We’ve also implemented several changes to prevent this kind of thing from happening again.

What exactly happened?

Two key things happened. First, activations and validations were both affected when preproduction code was accidentally sent to production servers. Second, while the issue affecting activations was fixed in less than thirty minutes (by rolling back the changes) the effect of the preproduction code on our validation service continued after the rollback took place. 

How did this happen in the first place?

Nothing more than human error started it all. Pre-production code was sent to production servers. The production servers had not yet been upgraded with a recent change to enable stronger encryption/decryption of product keys during the activation and validation processes. The result of this is that the production servers declined activation and validation requests that should have passed.

Why did it take so long to fix?

While the response to the activation issue was quick (less than thirty minutes) the effect on our validation service continued even after the rollback took place. We expected the rollback to fix both issues at the same time but we now realize that we didn’t have the right monitoring in place to be sure the fixes had the intended effect.

If the servers are down, why don’t you just assume the systems are genuine?

We do. It’s important to clarify that this event was not an outage. Our system is designed to default to genuine if the service is disrupted or unavailable. In other words, we designed WGA to give the benefit of the doubt to our customers.  If our servers are down, your system will pass validation every time. This event was not the same as an outage because in this case the trusted source of validations itself responded incorrectly.

What changes have you made?

We have implemented several changes to address the specific issues that took place over the weekend – for example we are improving our monitoring capabilities to alert us much sooner should anything like this happen again. We’re also working through a list of additional changes such as increasing the speed of escalations and adding checkpoints before changes can be made to production servers.

Why were some customers told that this problem might continue for days?

As I mentioned in my post yesterday, we erroneously said the servers might be down until Tuesday, when in fact they had already been fixed as of late Saturday morning Pacific Time.  We’re reviewing our procedures on that score as well – communicating clearly and accurately are super important when things like this happen.

What were customers experiencing?

For the customers who failed validation from Friday afternoon through Saturday morning the experience was that features we refer to as ‘genuine-only’ features were disabled. These features are Windows Aero, Windows ReadyBoost, Windows Defender (in this state Defender will scan and identify all threats it would ordinarily, but will only clean ones marked ‘severe’) and Windows Update (in this state only ‘optional’ updates are unavailable, all others can still be downloaded, including security updates). Also a desktop message appears in the lower right hand corner of the desktop area. The message reads ‘This copy of Windows is not genuine’ and the message is persistent until a successful validation is performed and the message goes away.

The form of validation failure experienced by those affected on late Friday and early Saturday DID NOT result in the beginning of the 30-day grace period during which activation is required. Nor was there any 3-day period during which a customer was required to do anything related to this issue. Disabling the genuine-only features is meant to provide notice to the customer of the state of the system. When disabled, the features present their own error messages relating to the system not being genuine. It’s unfortunate this happened to users with genuine systems.


I also want everyone to know that I am personally very disappointed that this event occurred. As an organization we’ve come a long way since this program began and it’s difficult knowing that this event confused, inconvenienced, and upset our customers. 

As always, please send your feedback to me through the blog (you can use the email link in the upper left hand corner of the page) or post comments.



Comments (25)

  1. quux says:

    Thanks for ‘coming clean’ about the problem; I know that takes a lot of guts to do, even though once it’s done, it seemed so simple.

    I highly encourage more transparency in both the WGA and Activations efforts at Microsoft. I’m pretty sure it will do you guys a lot of good in the long run. There are still many people with grave concerns as I’m sure you know! But, again, a hearty slap on the back for honestly admitting the slip-up.

  2. 4sysops says:

    It was hard to miss the news about the WGA (Windows Genuine Advantage (?)) outage Microsoft had this weekend. Just in case you managed it somehow, you might want to catch up on it in this Computerworld article. Microsoft’s Windows Genuine Advanta..

  3. Just what you need on a hot summer weekend – Microsoft’s Windows Genuine Advantage (WGA) online copy protection system goes on the blink and now your Windows XP or Vista machine thinks it is running an ripped off copy of the operating system. ..

  4. says:

    The human error is that the WGA code auto-disables unless it is validated, as opposed to being disabled when checked and invalidated. That’s a design flaw. The second human error is WGA itself. Simply verifying that you have a valid Windows install when you connect to the update servers, notifying the user and not allowing updates unless you are legit, should be enough to handle the anti-piracy goals WGA is meant to address. Remotely disabling OSes is on the same order as a coder putting  a logic bomb in their code to ensure payment, and, IMO, should be just as illegal.

  5. mgb1unc says:

    It’s time to end this nonsense known as WGA.  Enough is enough.  

  6. fin_head says:

    I commend you for figuring this out, and posting the postmortem.  That kind of transparency is hard to do, but ultimately it is the right thing to do.

    None the less, a legitimate user should never be suddenly be denied access to “genuine-only” features because of human, server, or network error in Redmond.

    Until you can verify that will be the case, there is no advantage in WGA, and Microsoft is just blatantly treading on the backs of it’s legitimate customers.

  7. jkipk says:

    I find it interesting that this was a check in problem.  With all the work that went in to check in’s with Vista, it seems inconceivable that code can be handled apparently so haphazardly.  The effects of this are damaging and widespread.  Hopefully some heads are rolling, because it may take years to gain back what has been lost.

  8. bevhoward says:

    The users being hit by this type of failure are the users with “install updates automatically” set to on, especially since WGA preassumes any problem is illegal until proven otherwise with no grace period, and, from the user side, the danger seems to be escalating.

    There are very few days when most users can afford to wait for inevitable mistakes such as these to be addressed.

    At the very least, impliment a grace period.

    Beverly Howard [MS MVP-Mobile Devices]

  9. In a day of ironies… Windows XP sp3 is announced…. Windows Vista SP1 and Windows XP SP3 Announcement

  10. Brings back an old saying from when I first entered the IT field several years ago, "To err is human

  11. Brings back an old saying from when I first entered the IT field several years ago, "To err is human

  12. linux ftw says:

    …thats why Microsoft Products shouldn’t be used. One big company has the power over YOUR

  13. Proeller says:

    First of all, thank you for being so honest!

    But let me tell you, what I was experiencing last Saturday morning to noon (I’m living in Germany). I was planning to do some urgent bugfixes to one of my applications. But first, I wanted to install the Vista Explorer bugfix…

    When I realized, that my Vista Enterprise Edition wasn’t valid anymore, I started blaming my son and the friend of my daughter, that anyone of them must have stolen my MSDN Volume License Key. I was absolutely sure, that this must have been the cause, why my legal Vista installation was no longer valid.

    They are both still angry with me because I made that unfair accusation to them.

    Then, I had two hours on the phone and in the internet, desperately trying to fix the mistake. No one of the guys, to whoom I spoke knew anything about the problem. They even questioned the product key.

    As a serious software consultant I cannot risk being accused of using illegal software. Also, debugging and testing on a somehow “locked” system doesn’t make much sense because you can’t say, if a certain bug is related to your software or to the locked down OS.

    As a developer, I was working with Microsoft for more than 20 years. But this incidence increases my growing doubts, whether Windows is really the platform I should focus in the future.

  14. Further investigation by our WGA team has brought to light more information on the WGA validation issue

  15. ioniancat21 says:

    It would almost seem as if Microsoft is purposefully hurting their reputation with their “paying” customer. Now Microsoft has taken down the Autopatcher site for copyright issues. Again, Microsoft is a master at shooting the customer in the foot.

    I know Microsoft “thinks” people will use Autopatcher to circumvent WGA. Guess what, users with Paradox are still patching their machines without Autopatcher. If Microsoft did their homework they would know Autopatcher is used to speed up the update process, not work around it. When Microsoft releases Vista SP1 and XP SP3 to the public, we’ll see how many crackers need Autopatcher to apply the SP’s in a couple of months!!!

  16. zed260222 says:

    what you need to do is have it so that when it fails on 1 server to detect genuine windows have it automatically check another server only when it fails both server checks then mark it non genuine

  17. chakkaradeep says:

    Thanks Alex for explaining about the issue. I am really happy that things are under control 🙂

  18. EricC says:

    So if the is system is ‘designed to default to genuine’, then why not simulate an outage by disabling connectivity while the rollback was occurring. This would allow your customers to continue without the erroneous validations. This would have been a non-story. Seems so obvious, I’m sure it was considered (??)

  19. hklm says:

    I find this an unfortunate an ocassional MSFT ostrich approach Alex.  There are still WGA problems, and if someone is a customer of Office 2007, when they call to your Canadian contracters Convergys of Ohio to resolve one, they are told that they have support that would start if they invoked it for 90 days and then the “support” is gone.  

    This makes no sense whatsoever which is why I worked early on to get myself in a position where I wouldn’t need the support but it is hardly fair to MSFT customers.  They shouldn’t have their support clock running in Office because of WGA team failures.

    I also tried to post this message 12 hours ago and apparently you’re censoring it out which is also unfortunate.

    Pretending problems aren’t there with WGA is not helping MSFT Alex.

    Ed Bott has tracked them as well and I’d say he’s also a reliable source:

  20. hklm says:

    Thanks for sharing your feedback! If your feedback doesn’t appear right away, please be patient as it may take a few minutes to publish – or longer if the blogger is ****censoring comments.

    The WGA site for reporting glitches looks like the morgues in Iraq.

    Skins so thin that you can’t brook constructive criticism are the reasons WGA is such a flawed product and concept.  It does next to nothing to put a dent into all the counterfeit Chinese territory sales of Vista and Office and other MSFT software, and apparently Bill Gates’ dinner with the Human rights tyrant Hu Jintao’s hasn’t put a dent in it either.  

    You have a near 50% rate of legitimate customers that you’re damaging with the WGA fiasco, and you’ve had 2-3 years to get it working.

    Also rubbing out the 90 days of Office support because of a WGA team glitch would be ridiculous if the Office support weren’t incomepetently done by Convergys of Ohio miimum waged butts in seats in Indian cities who can’t fix the many problems we do for your customers on the newsgroups day in and day out.

    We don’t do it “for MSFT”; we do it to protect customers from MSFT’s cheap outsourcing to incomeptent Convergys and we’re the competent tech support for your company’s failed PSS.

  21. klink says:

    I can tell you that it is very frustrating to go through Validation issues when you came by your software honestly. I can tell you it doesn’t help to call Microsoft only to get somebody who either doesn’t speak my this case English. I can also tell you that it does no good to call if you can’t get a knowledgeable person. What I  can’t tell you is how to fix the issues. Save a buck…ok use foreigners. Stop piracy…ok Make the software to aggravating to tolerate. You know. I’m not really a "I hate Microsoft" person. I don’t know that I’m your biggest fan either. I am however a pretty fair pc user and if I have validation I KNOW my po’ ole’ folks will. I mean intelligent folks that are not IT or very well versed in OS issues are just screaming! I am! You know there are REAL issues with Vista. BIG ONES. I use XP Media center and Vista on the same unit and well…XP is less aggravating to use but I swear Vista is darn near " crash proof" and believe me I can crash em’ too! You guys really need to carefully consider your next moves. Yes you do. I hear "Vista sucks" from folks who don’t have it! Never used it! I like it but I’m no one compared to the bad press driver issues and software issues have created for you. You really need to re-vamp this validation thing. I do understand paying big bucks for development and marketing, I do. I understand the protect the investment mindset. I wouldn’t want my work ripped of either! I know that as a customer going back to Dos 5 then Windows 3.0 then 3.11 for work groups to where you guys are at now has been interesting to watch. You guys have real competition now! good stuff too! Don’t blow it. Don’t step on us loyal,long term guys. WE are your bread and butter you know! Fix your stuff, Fix validation first! Then get your drivers together. Move forward to XP software compatibility. Last but not least fix the installed or should I say packaged Vista Software. Media center on down. Fix it. Not in a year. Soon. Or people will spend their hard earned money somewhere else. Hey you know what there are free alternatives! Lots of em’.

  22. MSDN Archive says:

    klink, thanks for your comment. I appreciate your passion for us getting the customer experience right. I can tell you we are working hard to deliver the best experience we can and also make progress on piracy. Don’t forget that many many customers are also affected by piracy when they pay for products that aren’t genuine. We have an obligation to the company, shareholders, engineers and customers help protect everyone from those that would steal and counterfeit our products.

    Thanks again for the comment.


  23. brianrusso says:

    Bit late to the party but I’m certainly glad I haven’t switched to Vista, and with more of my CAD/GIS applications being better supported on other platforms; frankly I have to wonder whether Vista will be the platform I’d choose.

    For me it just comes down to availability and the fact that quite frequently I am working on systems where access to the Internet is either policy-unavailable or simply unavailable; and having things these sort of ‘phone-home’ code additions and/or surreptitious updates is just not acceptable. Bottom-line. Software needs to be reliable and stable for many of us.

    I appreciate you have the company line that piracy affects all of us; but the bottom line is when you have a mission that needs to be accomplished you can’t have software fail because of an auto-update or a WGA lockout. Most programmes have enough problems as-is.

  24. says:

    With SP1, Microsoft plans to ditch the Vista kill switch