SYSK 249: Choosing the Right Microsoft Integration Technology

SYSK 249: Choosing the Right Microsoft Integration Technology

There are a few good sources on this topic, including:

1. “Choosing the Right Technology for Integrating Systems” presentation by Kris Horrocks, Sr. Technical Product Manager of Microsoft Application Platform

2. “Microsoft IntegrationTechnologies” PowerPoint presentation presented at BizTalk Deep Dive training seminar

3. “Understanding MS Integration Technologies 1.0” white paper

but none of them appear to be readily available for reference.

So, I took the liberty of combining the materials from the references above, and adding a few things of my own… The result are the guidelines below.

Introduction

There are many integration technologies available to developers, including WSE, SQL Integration

Services, Windows Workflow Foundations, Windows Communication Foundations, ASMX, SQL Replication, Enterprise Services, BizTalk Server, SQL Broker Service, Host Integration Server, MSMQ, .NET Remoting, to name a few.

The reason for this document is to review the appropriate uses of various Microsoft integration products and technologies, to clarify how these technologies are related, how they differ, and what guidelines you can use to choose the right one. The goal is to get clear and simple guidance developers and architects can use to make these choices, and learn the tradeoffs the guidance embodies: messaging scenario support, manageability, architectural features, performance, ease of use, etc.

There is no silver bullet for application integration. Different situations call for different solutions, each targeting a particular kind of problem. While a one-size-fits-all solution would be nice, the inherent diversity of integration challenges makes such a simplistic approach impossible. To address this broad set of problems, Microsoft has created several different integration technologies, each targeting a particular group of scenarios. Together, these technologies provide a comprehensive, unified, and complete integration solution.

Microsoft’s integration technologies can be grouped into several categories:

· Technologies for integrating applications directly, including ASP.NET Web Services (ASMX), .NET Remoting, and Enterprise Services. All of these will are now subsumed by a unified foundation for service-oriented applications, called Windows Communication Framework (WCF).

· Technologies for integrating applications through queues, including Microsoft Message Queuing (MSMQ) and SQL Service Broker (SSB). Both of these are also usable via WCF.

· Technologies for integrating Windows applications with applications and data on IBM systems. This diverse set of solutions is provided by Host Integration Server (HIS) 2006.

· Technologies for integrating applications through a broker, the approach taken by BizTalk Server (BTS) 2006.

· Technologies for integrating data, including SQL Server Integration Services (SSIS) and SQL Server Replication.

Each of these technologies has its own distinct role to play in integrating applications. All of them also have important things in common, however. All can be used from a single development environment, Visual Studio 2005, and all rely on a common foundation, the .NET Framework. Given this, combining these technologies is straightforward, making it easier to solve complex integration challenges.

Microsoft Integration Technologies

At the lowest level, integration is about getting two pieces of software working together to address a larger business need. The chosen solution often depends on many factors, including non-technical ones like an organizational structure, or a requirement to integrate with external business partners. Moreover, the solution may be different for integrating legacy systems, or purchased (“off-the-shelf”) products vs. home-grown solutions. A need to integrate heterogeneous data stores, or data stored distributed across different applications also present unique set of requirements. Finally, workflow based scenarios implementing business processes, the maturity level of applications in terms of their service orientation, all play a role in defining what is the right integration technology for a specific scenario.

For simplicity, let’s break down most commonly used integration technologies of today into two categories:

1. Messaging integration

· Used to transmit messages and associated data to trigger some work

· May require transformation of data

· Usually, supports sending and processing messages (synchronously or asynchronously), e.g. an event occurred in application A, which needs to be known and acted upon by applications B and C.

· Data size is usually not very large (bytes and kilobytes, rarely megabytes)

· Often embedded into applications

· Frequently is done on behalf of a particular user or application (something/somebody actively initiated it)

2. Data integration

· Used to copy data from one system to another

· May require transformation of data

· Usually used to move large amounts of data at a time, in bulk (megabytes or even gigabytes of data)

· Frequently implemented as a nightly process running once per day

· Can be thought of as “application independent” or “fundamentally passive”, i.e. it’s done as part of the data maintenance process, on behalf of the "infrastructure", e.g. data warehouse, not end user.

Data Integration

Messaging Integration

SQL Integration Services: Data integration service tightly coupled with SQL Server

SQL Replication: Data synchronization service implemented by SQL Server

MSMQ: Low-level message queuing implementation on Window platform

Windows Communication Framework: Unified framework for exposing and consuming services

BizTalk Server: Business process and integration server

SQL Service Broker: Reliable data tier messaging within SQL

SQL Integration Services (SSIS)

· SSIS is the recommended integration technology for loading data from many, often distributed, operational data stores into a single data warehouse. E.g. if you have partial data about certain entities distributed across different data stores (e.g. plant performance, including operations, human resources, financials, etc.), and you’d like to have a “single view of the truth”, consider SSIS as the integration technology for transferring the data into a data warehouse, which, in its turn, is used by decision support and business intelligence systems.

· SSIS has the functionality for building workflows to get data from desperate systems into one data warehouse, transform it, including doing data cleansing, and do “fuzzy matching” to correlate, i.e. find relationships, across data entities with different identifiers by applying rules, or by escalating this correlation job to a human for reconciliation. SSIS packages can be scheduled or run manually, or run in reaction to triggers.

· Another important usage of SSIS is to get a holistic view of your entity (e.g. plant performance) in real time. Instead of having each application requiring such view to establish point-to-point connections with each applications controlling a portion of plant operations data (e.g. Great Planes, Maximo, TRESS, HQ Historian, etc.) , and then merging the data, you can build a “clearing gateway” – a database that has all these federated (distributed) queries -- and provide a simple, single interface to query against. NOTE: these data stores don’t have to be SQL Server based; they can be any data providers supporting OLE DB or ODBC protocols (e.g. flat files, DB2, Oracle, Excel documents, Active Directory, etc).

· Another common usage of SSIS is the simple data transfer between heterogeneous DBMSs (e.g. SQL Server, Oracle, Microsoft Excel)

· NOTE: SSIS workflow is targeted for data integration and transformation, i.e. ETL (extract-transform-load) type of scenarios. It is not a general purpose workflow or integration tool. SSIS is not the right tool for integrating sending of an e-mail into a workflow, or FTPing files to a trading partner, or exchanging EDI messages, as these types of activities may be difficult to implement within the confines of the SSIS environment, and there are other tools better suited for this purpose.

· The primary integration scenarios for SSIS are:

o Combining information from a group of operational databases into a data warehouse. Along with powerful support for data transformations, SSIS provides graphical tools for defining the ETL process, fuzzy logic for data cleansing, error handling, and other features to make integration of diverse data easier; i.e. warehousing and federated queries.

o Transferring data from one DBMS to one or more other DBMSs. Because SSIS supports heterogeneous data sources, the products involved might or might not be SQL Server 2005.

o Loading data into SQL Server databases from flat files, spreadsheets, and other diverse data sources.

· Key Features

o Extract, Transform, Load (ETL) designers and engine

o Error Handling and Fuzzy Logic components for data cleansing

SQL Replication

· SQL Server Replication allows replicating data across two or more SQL Server databases. SQL Server Replication is designed for situations when you need to have a copy of the same data in multiple databases, and to have that data automatically kept in sync. For example, letting applications running on a group of web servers spread their read requests across a group of identical databases, each on its own machine, can improve the application’s scalability and availability. For this arrangement to work, all updates must go to a single database instance, and then be propagated to the read-only copies, and SQL Server Replication is well suited to handle this type of need.

· If synchronization of identical data sources in near real time is required, SQL Server Replication is the best choice. No other integration technology provides support for rapidly tracking and delivering changes as they occur. While it’s possible to write a custom replication application using BTS 2006 or other integration technologies, it’s challenging to do correctly. The result also wouldn’t have the performance or the features that are included in SQL Server Replication.

· SQL Server Replication replicates data identically using either all columns of the selected tables or a subset of the table columns, while maintaining the data in the same format. SSIS has the additional capability to transform the data as it is moved, and so it’s a better choice for copying entire tables that have different structures. Yet unlike SQL Server Replication, SSIS doesn’t support replicating incremental row-level changes—it copies the whole table—which means SSIS typically has lower performance. As an ETL tool, SSIS focuses on integrating diverse data rather than replicating identical data, the problem for which SQL Server Replication was designed.

· One of the most important benefits of this technology is a powerful user interface that lets database administrators easily define what data should be replicated, see differences between tables, and more. A replication conflict viewer is also provided, allowing data conflicts that occur during replication to be resolved.

· The primary integration scenarios for SQL Server Replication are:

o Replicating data between tables in one or more SQL Server instances. Those instances might be running on servers, clients, or even mobile devices that are only occasionally connected. Rather than copying entire tables, SQL Server Replication replicates incremental row-level changes, letting updates be propagated at near real-time speeds.

o Using SQL Server as a source for data that is replicated to IBM and Oracle databases.

o Using Oracle as a source for data that is replicated to SQL Server, IBM, and Oracle databases. In this case, data is first replicated to a SQL Server database; then replicated to the other databases.

· NOTE: Data replication is useful in a variety of different scenarios, and so SQL Server Replication can be applied in numerous ways. In fact, even though this technology can be correctly seen as a tool for integration, it’s just as accurate to view SQL Server Replication as a solution for data synchronization.

MSMQ

· Direct communication between parts of an application is simple to understand and straightforward to implement. However, rather than communicating directly with another application, it’s often better to use queued messaging, as the target and source do not need to be running at the same time. This style of communication relies on the presence of one or more queues between the sender and receiver, with applications sending messages to and receiving messages from these queues.

· MSMQ is part of the Windows platform, and is integrated with .NET framework.

· Queued communication lets applications interact in a flexible, adaptable way. One major benefit of this approach is that the receiving application need not be ready to read a message from the sender at the time that message is sent. In fact, the receiver might not even be running when the message is sent. Instead, messages wait in a queue, usually stored on disk, until the receiver is ready to process them.

· The primary integration scenarios for MSMQ are:

o When asynchronous communication is required between two or more Windows applications.

o When the sender and receiver might not be running at the same time.

o When message-level logging is required.

· Key features

o Built into Windows

o .NET APIs and MMC Management

o Supports in memory or durable storage

o Not conversational

Windows Communication Foundation (WCF)

· WCF is often used to implement application-to-application (or service) connectivity, regardless whether they are running on the same computer, within the same domain, or not. WCF bring in unification of programming models such as web services, message queuing, enterprise services, and .NET remoting.

· NOTE: You can change runtime configuration attributes, such as the protocol, formatter, security, transaction support, etc. , via a configuration file. This allows for building very flexible solutions that can be deployed in a wide range of environments.

· NOTE: point-to-point integrations may lead to a point where the high number of direct connections is too hard to manage, especially in the light of the fact that you don’t have a single view of what those connections are. WCF doesn’t address the limits of point-to-point integration. This kind of ad-hoc integration does address application needs on a project by project basis is a viable method for organization of small to medium scale. However, if managing service availability or govern access to these services, is important, then WCF may not be the best tool for the job, especially if you integrated with your business partners. Note: this issue is technology agnostic – it’s about whether you approach integration in an ad-hoc or systematic way.

· The primary integration scenarios for WCF are:

o Direct web services communication between a Windows application and an application built on another web services platform.

o Direct communication between a Windows application and another Windows application or service.

o Communication via message queues between a Windows application and another Windows application using WCF over MSMQ or SSB. Note: WCF provides a common programming interface for both queued and direct communication, and it also allows a single application to expose endpoints supporting both communication styles.

· Key features

o Unified programming model

o Broad standards support

o WS-Security, WS-Trust, WS-SecureConversation, WS-Policy, WS-Coordination, WS-AtomicTransation, - WS-ReliableMessaging

BizTalk Server

· Rather than integrating applications directly, it sometimes makes more sense to connect them via a broker. A broker is software that sits between the applications being integrated, interacting with all of them. By providing a common connection point, brokers avoid the complexity that can arise when several applications are connected directly to one another. Brokers can provide a range of integration services, including transformations between different message formats and support for diverse communication technologies. A broker can also act as a platform for its own application logic, providing the intelligence to control a business process. For Windows, connecting applications through a broker means using BizTalk Server 2006.

· BTS (BizTalk Server)2006 sits in the middle of a group of applications. By providing adapters for various communication mechanisms, including MSMQ, WCF, EDI, and many more, BTS 2006 can communicate with Windows and non-Windows applications in a number of different ways. BTS 2006 also provides other integration services, including:

o The ability to graphically define orchestrations, logic that interacts with applications on other systems to drive an integrated process, together with runtime services for orchestrations, such as state management and support for long-running transactions.

o Graphical definition of XML schemas for messages, along with the ability to define transformations between incoming and outgoing messages that use those schemas.

o Business-to-business (B2B) integration features, including support for Electronic Data Interchange (EDI), RosettaNet, HL7, and other standard interchange formats. It also includes services for managing interactions with trading partners.

· NOTE: each individual connection may still leverage technologies such as WCF, but instead of talking directly to another application, they talk to a hub like BizTalk Server.

· BizTalk Server 2006 is also a business process server. Viewed from this perspective, it provides features such as:

o Support for graphical definition of business processes using an Orchestration Designer hosted in Visual Studio. A lightweight Visio-hosted Orchestration Designer for Business Analysts is also included.

o Business Activity Monitoring (BAM), providing real-time displays of business process information to the information workers that rely on those processes.

o A Business Rules Engine (BRE), letting complex business rules be defined, accessed, and maintained in a single place.

· The benefits, just to name a few, are: allowing to take a service offline (e.g. for maintenance) and queue incoming requests for processing at a later time (i.e. manage availability), govern access by time of day and other attributes, comprehensive logging, ability to see the state (e.g. which applications deployed, which are running, ability to see all ins-and-outs, i.e. which application is connected to which, used protocols) and health of the environment from a single console, etc. In summary, BizTalk server provides a managed environment for service availability.

· The primary integration scenarios for BTS 2006 are:

o Creating brokered application-to-application message-based integration, especially when data mapping and support for diverse communication mechanisms is required.

o Implementing integration processes, including long-running processes that take hours, days, or weeks to complete, and processes with complex business rules.

o Addressing B2B integration, including situations with many trading partner interactions and those that require industry standards such as RosettaNet and HL7.

o Creating business processes that give information workers real-time visibility into an integrated process.

· NOTE: Using a brokered approach can add cost to an integration solution. Still, for scenarios such as those just listed, the additional overhead of a broker is more than made up for by the value it provides. In these situations, BTS 2006 is the best approach to integrating diverse applications.

· Key features

o Managed security, reliability, scalability

o Management console and integration with Microsoft Operations Manager (MOM)

o Content/context based, multi-endpoint, message routing and transformation

o Business process state management and distributed activity monitoring

o Business Rules Engine

SQL Service Broker

· SQL Server Broker (SSB) is a relatively new communication technology provided as part of SQL Server 2005. Applications written as stored procedures in T-SQL or in a language based on the Common Language Runtime (CLR), such as C#, can now use SSB to communicate via message queues.

· Service Broker gives you two core capabilities:

1. Queuing right in the database. Examples include running a job asynchronously, often in response to a trigger, with reliable (guaranteed) query processing.

2. Ability to abstract data stores that need to talk to each other from each other, i.e. loose coupling. E.g. if a record change needs to be published to other “systems”, instead of hard coding the identities of these recipients, you can use SSB as an intermediary between the data source and recipients of the updates, using a publish-subscribe type of model. NOTE: recipients don’t have to be SQL Server based.

· By using SQL Server 2005 to persistently store queued messages, SSB can provide efficient, full-featured communication, including high-performance transactional messaging, integrated backup and recovery mechanisms, and more.

· The primary integration scenarios for SSB are:

o Connecting stored procedures in SQL Server with another instance in a messaging paradigm

o Applications that must receive a message and perform an update to SQL Server 2005 within the same transaction

o Need sticky conversations

· Core Scenarios

o Asynchronous triggers

o Reliable query processing

o Reliable data collection

o Large-scale batch processing

· Key features

o T-SQL constructs for queues, services, and messages

o Conversational and reliable styles

NOTE: Below are examples where MSMQ (via WCF) is preferable to SSB:

o Communication between applications that need only in-memory queuing, i.e., those that don’t require messages to be stored persistently in transit. MSMQ supports memory-based queues, while SSB does not.

o Situations where the extra license cost of SQL Server 2005 isn’t acceptable. Unlike SQL Server 2005, MSMQ is part of Windows, and so there’s no extra charge for using it. Even though SSB is included with the Express edition of SQL Server 2005, which doesn’t require a paid license, every message sent via SSB must traverse at least one licensed copy of SQL Server 2005.

Host Integration Server

· Host Integration Server acts as a gateway between the Windows world and the Host System (e.g. mainframe) world by acting as a basic network gateway providing file and print services between the two environments, and by providing an environment for application and transaction integration (e.g. between an Windows application and an application running within a CICS, IMS, DB2, OS/400 and other environments. It handles complex mediations of security contexts, protocols, transactions, etc. between the two systems. Note, HIS 2006 continues to deliver the network level interop, but the application level integration (e.g. transactions) is now moved to a set of adapters for BizTalk Server.

· The primary integration scenarios for HIS are:

o Connecting Windows systems to IBM zSeries mainframes and iSeries midrange systems using Systems Network Architecture (SNA) and other IBM communication technologies, including SNA over TCP/IP.

o Integrating Windows security with IBM mainframe or midrange security systems, including IBM’s Resource Access Control Facility (RACF) and Computer Associates’ ACF2 and Top Secret.

o Accessing existing CICS and IMS applications, either directly from .NET Framework applications using HIS 2004’s Transaction Integrator or via Web services.

o Creating Windows applications that access data stored on zSeries and iSeries systems, including VSAM data and relational data stored in DB2.

o Connecting MSMQ to IBM’s WebSphere MQ, allowing messages to be transferred between these two message queuing technologies.

· Key features

o Transaction integration with CICS, IMS and DB2

o VS Projects and Wizards for COBOL and RPG host code

o BizTalk Adapters

IMPORTANT: the type of work being done should dictate the technology choices, and in many cases, it may be a combination of different integration technologies -- SSIS is a data tier integration level technology, and WCF and BizTalk are application level integration technologies, often working on behalf of a user or application, receiving messages from some systems and sending them to others. Consider the following scenario – you need to processing a shipper cargo nomination, which logically fits well with BizTalk provided tools, but, as a part of it, if you need to get a single view the shipper info + nomination from different systems and present to a person processing the request. Since BizTalk is good at handling business processes, and coordinating various activities across dispersed systems, and people facing interfaces, but it’s not the best choice for grabbing data from several data stores, we can enlist SSIS to do the federated query, bring the data in, cleansed it, aggregate it, and pass in back to BizTalk for further routing and processing.

Summary of Microsoft Integration Technologies

Integration Style

Technology

Primary Integration Scenarios

Integrating applications directly

ASP.NET Web Services (ASMX)

Connecting Windows applications with Windows and non-Windows applications via SOAP

.NET Remoting

Connecting Windows applications with other Windows applications via distributed objects

Enterprise Services

Connecting Windows applications with other Windows applications that use distributed transactions, object lifetime management, etc.

Windows Communication Framework (WCF)

Connecting Windows applications with Windows and non-Windows applications using web services, distributed transactions, lifetime management, etc. (subsumes ASMX,.NET Remoting and Enterprise Services)

Integrating applications through queues

Microsoft Message Queuing

Connecting Windows applications with other Windows applications using queued messaging

SQL Service Broker

Connecting SQL Server 2005 applications with other SQL Server 2005 applications using queued messaging

Windows Communication Framework (WCF)

Connecting Windows applications with other Windows applications using queued messaging (via MSMQ and/or SSB)

Integrating with applications and data on IBM systems

Host Integration Server 2006

Connecting Windows applications with IBM zSeries and iSeries applications and data

Connecting MSMQ with IBM WebSphere MQ

Integrating applications through a broker

BizTalk Server 2006

Connecting Windows applications and non-Windows applications using diverse protocols

Translating between different message formats

Controlling business processes with graphically-defined orchestrations

Connecting with business partners using industry standards, such as RosettaNet and HL7

Providing business process services, such as Business Activity Monitoring and a Business Rules Engine

Integrating data

SQL Server Integration Services

Combining and transforming data from diverse sources into SQL Server 2005 data

SQL Server Replication

Synchronizing SQL Server data with copies of that data in other instances of SQL Server, Oracle, or DB2