The Internet of Things (IoT) is gaining momentum with more and more investment from governments and companies being announced all the time. There is still some complexity to the development of solutions in this segment because the solutions require connecting a wide range of computing devices in new ways, standards are still emerging, and frankly the segment is too diverse for a single set of standards anyway. That said, Microsoft has a number of really great technologies in this space which I think we have not said enough about. I have been talking for some time about NETMF’s part in providing a single programming model and tool chain for IoT solutions that spans from small sensors to the cloud. That is a big benefit as the worlds of embedded, PC/Server, WEB, and Cloud merge. Today, I want to introduce Microsoft StreamInsight – a technology that is a huge asset for connected devices.
We have all seen the numbers – 15 Billion connected devices in the next few years and growing rapidly from there. There will be all sorts of topologies and device types. A few things are obvious though. There will continue to be price pressure on the endpoints – if you are deploying thousands to millions, they need to be as inexpensive as possible. As a result, most of the endpoints in these solutions will be forwarding very raw data and the aggregate will be raw data in large/huge volumes. Effectively and efficiently managing that data stream will be key to building successful deployments. That is where StreamInsight shines.
The model with traditional database applications is that data is gathered and deposited into the database and then queries are run against it. This is fine for many applications where the data is of proven value (like your name and account number ) and queries are run frequently triggered by some external event (you are making a purchase) and a delay of a few seconds is OK. What if instead you have masses of data (like from every household that buys its electricity from one utility) and the value of each individual data item (the instantaneous use of power of one household) is not intrinsically high by itself and you need to be able to react in milliseconds (when everyone plugs in their new electric vehicle at the same time). This second scenario is what we will see in greater and greater frequency as we instrument the physical world around us. Each sensor will respond with raw data that individually has little value and may only have any value for a few seconds to minutes until the next reading but where we want to respond to changes in the aggregate data very quickly.
You could still put everything into the database and continuously run discrete queries against it but StreamInsight offers a better model for this second use case that avoids putting a lot of low value data into storage and offers instantaneous response – Complex Event Processing on the data streams. Below is a pictorial representation of the traditional and the StreamInsight models.
StreamInsight provides a stream data processing engine that allows you to continually run queries written in LINQ (Language Integrated Queries) against the data. You can then trigger responses as needed, select the interesting data that you actually want to put into the database, show the current status of the aggregate data in a Silverlight dashboard, or any of a number of other new ways of dealing with massive amounts of input that are provided by the elegant architecture.
As you can see from the diagram above, the data inputs are conditioned by Input Adapters that support data from any source – since you can write the adapter that is needed for you application with the SDK. There are a set of stock adapters that come with the installation. The same is true for the Output Adapters – they can be developed to handle the specific needs you have for responding to the data whether it is displaying that aggregation on a workstation or raising alerts or whatever.
One advantage of the traditional model is that once things are stored in that database, the data will be preserved even if someone trips on the power cord to the machine. The StreamInsight team has built a robust set of functionality that is about to go out in its second version. So, even though we are dealing with ephemeral streams of data and state inside the data processing engine, StreamInsight gives you the tools to checkpoint the processing and recover the state across outages. There are also a number of tools in this product for things like tracing an event back to the root cause (Event Flow Debugger). If an event is triggered, you can search back to understand exactly what data triggered that event which might be important in formulating the correct response.
With the ability to handle up to 100,000 events per second from millions of data sources, you can see why I think this functionality is a great fit for the Internet of Things where applications will be generating massive amounts of data and where the traditional database model is less likely to be a fit. We are starting to play around with using StreamInsight with NETMF devices so watch out for some follow-ups to this blog. In the mean time, if you want to read more, here are some sources:
– Hitchhiker document for query writing: http://blogs.msdn.com/b/streaminsight/archive/2010/06/08/hitchhiker-s-guide-to-streaminsight-queries.aspx
– Overview session from last TechEd: https://channel9.msdn.com/Events/TechEd/NorthAmerica/2011/DBI303