There’s an interesting TechNet article all about how our IT team keep Microsoft.com up and running. Alongside the article, there’s also a webcast available, so that you can explore the story in more detail. As learning becomes more reliant on technology generally, and the web specifically, then there are some strong parallels with the challenges faced by IT teams in universities.
The Microsoft corporate Web site, Microsoft.com, is one of the largest and most heavily visited sites on the Internet, yet it maintains consistently high availability ratings. The team that operates the site meets these demands through a combination of carefully planned infrastructure; collaboration with other teams; and use of technology for maintenance, monitoring, and change management.
During the past eight years, Microsoft.com has achieved one of the highest rankings on the Internet in terms of site availability as measured by Keynote Systems Inc., an independent third party. According to the Keynote reports, Microsoft.com has been available more than 99.8 percent of the time for the past five consecutive years, and more than 99.9 percent of the time for the past two years. The site generates more than 1.2 billion hits per day from more than 57 million unique Internet Protocol (IP) addresses. This traffic generates 200 million daily page views, averages 30,000 Hypertext Transfer Protocol (HTTP) requests per second, and results in an average of 750,000 concurrent client connections.
The Microsoft.com Operations (MSCOM Ops) team within Microsoft Information Technology (Microsoft IT) operates more than 300 production servers that host approximately 900 Web applications. Based on Internet Information Services (IIS) and Microsoft® SQL Server® database software, the infrastructure design takes advantage of newly released tools and features in support of the team's goal to be an early adopter of Microsoft technologies.
The article describes how team identifies and mitigates potential points of failure to deliver continuous availability for Microsoft.com—even while adopting new Microsoft technologies in the production environment, and includes best practices developed over years of operating a highly available, large-scale, and continuously performing Web infrastructure. The best practices address:
- How to identify and address availability issues through building in redundancy and evaluating the need to design solutions to geographic-segmentation challenges.
- Process guidance, including suggestions about when, during the software development life cycle (SDLC), operations and applications engineers can work together to support delivering high availability.
- General guidance on planning for, building, and using proper monitoring based on understanding site traffic.