Moving to Big Data within Microsoft IT

This is a guest post by Barry Briggs, Chief Architect and CTO, Microsoft IT.

The other day I participated in a live webcast to discuss “the new world of data on-demand now.” In short, the importance of big data processing to organizations. You can watch the 90-minute show here. The webcast included my colleague Terri Jordan, who heads up IT for Microsoft’s retail stores, and one of my favorite writers and the data editor at The Economist, Kenn Cukier [his blog, Twitter: @kncukier].

I wanted to share a few more thoughts about our strategy and thinking about big data processing within Microsoft IT. I didn’t have the opportunity to share all of these thoughts during the webcast, and I want this community to hear how we got started.

Certainly the giant search engines and social networking sites have shown some of the value that consumers can get from big data. In the world of the enterprise there are equally remarkable opportunities. In particular, I think the opportunities for greater insight for enterprise executives from big data and, ultimately what I call “big math,” are incredible.

Today we deliver managed, self-service business intelligence (BI) to 39,000 employees using tools like Excel, SharePoint and SQL services. We want to apply the same maximum freedom to business users around big data processing, but reduce the analysis time and reporting time on larger data sets.

Microsoft IT recently completed a series of projects using internally-borrowed clusters to show we can speed-up development and application processes using Microsoft’s big data platform, a port of Hadoop to Windows Server, codename “Isotope.” This demonstration was the proving ground for future investment in big data. I’ll briefly explain these demonstrations.

Our first example is around our marketing teams trying to model the online and offline behavior of many different types of customers. Their data sets range from 100s of millions to 100s of thousands, and trying to model online and offline behaviors is difficult with the limitations of current model predictors. Based on our big data demonstration, in the future we should be able to deliver a more robust set of predictor variables, more control over model development, more agility to validate models, reduce the cycle time to see results, and reduce costs for development and maintenance.

Another example is in telemetry.  Some selected consumers and customers have given us permission to collect data on what they do with their computers, and as a result we collect lots and lots of clickstream and keystroke data. Collecting and analyzing this data gives us much better insight on how customers use Windows and Office, how they interact with other programs. With big data processing, we are able to analyze larger data sets from larger panels of people, resulting in higher levels of product quality and better customer experience.

To be clear, I believe big data is one side of the coin; big math is the other side. Certainly there are technical challenges aggregating data in a wide variety of formats from a wide variety of sources at different velocities. However, a bigger challenge over time will be finding and developing the skill sets to drive insight from the data. Skills development and hiring from adjacent markets will be a top priority. For example, I have a team of statisticians in my Enterprise Architecture organization charged with such analysis and these types of folks will increasingly be in demand.

As for technical challenges, Microsoft IT actually borrowed server and processing time from our product teams to do our initial demonstrations. We showed that we could actually get great value in terms of speed and insight from big data processing. We’ve also experimented with using server and storage capacity in Windows Azure for big data work, and this is promising as well. Big data and big math are great applications for the cloud because of the temporary scale needs, access to external data, etc.

Of course, you also will need to extend the controls you have in place for privacy and security to your big data solutions. I’d recommend that IT organizations, in cooperation with their risk management teams, develop an enterprise security framework that covers the permissible, and non-permissible, uses of data.

Finally, IT departments need an overall consistent approach to enterprise architecture. I consider enterprise architecture as “connecting the dots” i.e., making lots of systems work in concert no matter their location. Part of this step, inasmuch as possible, is to adopt an enterprise data model: the definitions of the key entities that your company deals with. Having such a model will make your big data results more consistent, rich, and insightful.

I believe the IT departments that succeed in turning big data into actionable information will provide their business a clear competitive advantage. I’m interested to hear your thoughts on this topic, so please do leave a question or comments.

Barry Briggs