The Microsoft corporate network: 1.7 times worse than hell


Today I'm going to tell a story from 1996. Why? Because I can.

One of the tests performed by Windows Hardware Quality Labs (WHQL) was the NCT packet stress test which had the nickname "Hell". The purpose of the test was to flood a network card with an insane number of packets, in order to see how it handled extreme conditions. It uncovered packet-dropping bugs, timing problems, all sorts of great stuff. Network card vendors used it to determine what size internal hardware buffers should be in order to cover "all reasonable network traffic scenarios".

It so happened that at the time this test had currency (1996 era), the traffic on the Microsoft corporate network was approximately 1.7 times worse than the NCT packet stress test. A card could pass the Hell test with flying colors, yet drop 90% of its packets when installed on a computer at Microsoft because the card simply couldn't keep up with the traffic.

The open secret among network card vendors was, "If you want your card to work with Windows, submit one card to WHQL and send another to a developer on the Windows team."

(This rule applied to hardware other than network cards. I was "gifted" a sound card from a major manufacturer and installed it on my main machine. It wasn't long before I found and fixed a crashing bug in their driver.)

[Raymond is currently on vacation; this message was pre-recorded.]

Comments (17)
  1. I remember around the NT 3.51 / early NT4 days a lot of video cards had awful drivers, the one exception was ATI which seemed to have rock solid drivers. Later I read somewhere that pretty much every Windows developer was running an ATI video card which I’m sure played a role in that.

  2. Of course the real question to ask is did ATI improve their driver or did Raymond and the rest of the Windows Team include code in Windows to account for ATI’s bad behavior?

    James

  3. Brian, I believe most of us were using S3 cards at the time actually.

  4. Chris says:

    In what specific ways was the Microsoft corporate network worse than Hell? Why didn’t the author of Hell record Microsoft corporate network traffic and simply play it back as his test?

    This funny story also shows how hardware manufacturers should offer big discounts to become official suppliers of Microsoft employee workstations! <:)

  5. Why was the corporate network hell?

    Because there was more traffic going over the corporate network than in any other network that anyone had ever seen.

    Vendors would regularly show up at Microsoft to pitch their newest coolest hardware solutions. And we’d put them on the corporate network and watch the vendors solution collapse under the traffic. There were very few vendors who had systems that could handle the load.

    A large part of the issue was that at the time, a large part of the network traffic was NetBEUI, which requires broadcasts for name resolution (it’s a non routable protocol).

    As a result, there was a HUGE amount of broadcast traffic going out on the network

    We’ve long since moved away from NetBEUI, but there is STILL an insane amount of traffic on our corporate network. The Microsoft corporate network is one of the most complicated corporate networks in the world.

    And it’s a remarkable tribute to the IT department that it just works.

  6. Anon for obvious reasons says:

    I know some people who would say that it’s not just Microsoft’s network that’s 1.7 times worse than hell… ;-)

  7. microbe says:

    : And it’s a remarkable tribute to the IT department that it just works.

    I’m not sure about this. Your IT department might spend more time on lowering the traffic instead of making it "work" under heavy load. Is there still something else as stupid as NetBEUI?

  8. Microbe,

    Microsoft’s choice of NetBEUI (made in 1984, when DNS hadn’t been invented and TCP/IP name resolution was done with static host files) was the right decision AT THE TIME.

    And it took a long time for that decision to go away.

    At the time that the Microsoft network was first rolled out, NONE of the existing networking standards was capable of being deployed on a single network of the size of the Microsoft corporate network. The only one that had a snowball’s chance was TCP/IP and it had massive managibility issues (static IP addresses, and hard coded host names).

    We couldn’t deploy TCP/IP on the corporate network without DHCP and WINS (I’d like to see you try maintaining a network with more than 50,000 computers on it with static IP addresses), and DHCP didn’t come online until sometime around 1993 (and DHCP happened largely because Microsoft pushed it through the IETF).

    It takes time to deploy these technologies, it wasn’t until NT 4.0 (around 1996, I believe) that we were able to remove NetBEUI as a transport on the network, it took that long to get all the pieces in place.

    The Microsoft network’s still not much quieter – we’ve removed most of the stupid traffic, now it’s just dealing with the sheer size of the network.

    There are currently significantly more than a quarter of a million computers on the Microsoft corporate network.

    To my knowledge, there isn’t a larger private network in the world.

  9. Wes Miller says:

    "I’m not sure about this. Your IT department might spend more time on lowering the traffic instead of making it "work" under heavy load."

    For better or worse, there is no Ctrl+Z for infrastructure. Once you implement a technology in an infrastructure it’s hard to just "make it go away". You have to find a viable replacement, implement it, and destroy all vestiges of the old plumbing. And with a company as dedicated to backwards compatibility as Microsoft, "killing off" infrastructure plumbing is – really – hard to do.

    My $0.02

  10. andrew queisser says:

    This seems odd to me too. It doesn’t seem hard at all to completely saturate an Ethernet network for the purpose of testing. Why didn’t Hell have a slider from 0-100%? When I was working on network protocol drivers I often caused broadcast storms accidentally and we also had defective network cards that would bring down the network.

  11. Cheong says:

    Will you name some of the "survivor" network cards? We have one of the server’s network card in the datacenter becomes quite unstable and we’re considering replace it. If I can have that namelist of network cards we’ll seriously consider them for sure.

  12. Cheong, the cards that survived 9 years ago aren’t likely to be the best you can get today.

    They were probably either isa or eisa cards back then – they don’t even plug into your computer.

  13. Tim Smith says:

    It depends on what you mean by saturation. In an 802.3 network, 35% utilization is high traffic. You start getting a lot of collisions. Getting an 802.3 network to 100% utilization would be VERY hard.

  14. Check out Raymond’s post on our corporate network here at Microsoft.

    The Microsoft corporate network:…

  15. kbiel says:

    Given the limited information in the article above, it would seem the diffence b/w the software Hell and the real thing, is one of collisions. It sounds like Hell was pushing packets through, but putting it on a big network that was made of only routers, bridges and <ugh> hubs, considering it was 1996, brought a whole different stress test as the card(s) had to probably negotiate almost constant collisions.

  16. Norman Diamond says:

    Thursday, May 12, 2005 3:06 PM by LarryOsterman

    > At the time that the Microsoft network was

    > first rolled out, NONE of the existing

    > networking standards was capable of being

    > deployed on a single network of the size of

    > the Microsoft corporate network.

    When you define DECnet as having been not a standard (just as Microsoft’s products aren’t standards) you can indeed say that.

    > (I’d like to see you try maintaining a

    > network with more than 50,000 computers on

    > it with static IP addresses),

    Sort of like maintaining a network with more than 50,000 computers on it with static DECnet addresses? Guess what, someone did it, and it worked for more than 10 years. It’s really not that much different from having a registrar assign ranges of IP addresses (this company gets these 8 addresses, this company gets these 4, this ISP gets these 65536, this US company gets these 16777216 even though they’re never going to connect them to the internet, etc.).

Comments are closed.