A High-Level StreamInsight 2.1 Preview

Hello Folks,

With StreamInsight 2.0 barely out the door, it may seem soon to start talking about the next version, but the team has been busy adding features and trying to keep to our usual 6-8 month release cycle. And with 2.1 shaping up to be a pretty significant release for us, we’d like to start giving a preview.

I’m not going to dig into a lot of technical details in this post – those will be forthcoming – but I would like to give a high-level overview of what we’ve done and why we’ve done it. Let’s start with motivation.

We’ve received a lot of feedback on StreamInsight 2.0 and earlier. To sketch some, we’ve heard that:

  • The object model is somewhat hard to understand. E.g., what exactly is a query vs. a query template?
  • It’s difficult to write the basic plumbing. And the state machine adapters need to adhere to makes them particularly hard to write.
  • Although many of the aforementioned problems can be avoided by using sequence input (IObservables and IEnumerables) and output, they are restricted to the embedded host: as soon as you want to run remotely, you have to use adapters and the full object model.
  • Using checkpointing likewise requires that you abandon sequence input.
  • The query topologies supported by checkpointing are too limited. In particular, many users want to have a single query connect to a remote data source, ingest the data into StreamInsight, and present it to other queries. This is done through the use of published streams, the use of which precludes checkpointing.

In addition to this, the Reactive (Rx) community has been asking for a server like our remote host.

While any code written against StreamInsight 2.0 and earlier remains supported, StreamInsight 2.1 includes a rather large update to our programming surface that address all of these. We have:

  • Created a new object model that is much more clear and consistent. We’ll talk about details in another post, but this object model is heavily influenced by Reactive’s sources, sinks, and subjects.
  • Supported observable and enumerable workflows in the server. These can be combined with temporal logic or used independently. E.g., you can use Rx to marshal data into StreamInsight, or you can use the server to host solely-Rx pipelines – whichever best matches your workflow.
  • Eliminated the need to use the complex adapter contracts in most cases. Instead of adapters, ingress and egress can be handled with observables and enumerables. Adapters remain fully supported in the new model, so you can continue to use them.
  • Expanded the set of workloads that support checkpointing to include those with shared computation.

We’ll be sharing more details about the upcoming release over the next few weeks.

Cheers
-Isaac