Last Monday, Roger Wolter joined my team from the SQL Server product team. Roger is well known in the data community and has spent a lot of time lately on SQL Service Broker. Welcome Roger.
This prompt me to follow-up on my SOA, 6 blind men and an elephant post; and talk a little bit about an aspect that is often overlooked in designing and building Connected Systems: DATA.
In my experience in talking to many enterprises, the discussions often gravitates around services: how to design a service, what is the right granularity, what is the appropriate contract etc. one aspect that is often not discussed enough is how to think about data in this new class of applications where systems are getting all connected together.
The classic example needing to be solved is the 360 degree view of the customer. In a typical enterprise, information about customers are stored (and often duplicated) in several systems. For example, part of my customer info is in the CRM systems, other parts of it sit in a couple of proprietary line of business applications and of course if the enterprise has several "relationships" with the customer, the information is duplicate for each "relationship". In addition to that, data is more an more in non relational format (XML, Email, .xls spreadsheets…) and in non transactional store (batch oriented mainframes, file systems, accessible via a web service call). See image below:
With the movement from "classic" 3 tier architecture to connected systems, data is evolving from session oriented, relational and transactional to message driven and loosely coupled. Among other things, this promotes the creation of several "types" of data, each having its own specific treatment. These different types are:
- Reference data
- Think of it as a catalog. The data is immutable, once published it is always the same. It is used to create service requests and its format must be interpretable by all parties; it must have a stable identifier (e.g. Item 23, Page 64, March 2005 catalog) and can be infinitely replicated and cached
- Resource data
- Resource data has a very long lifetime, think SKU’s, Customers, Accounts. Its format is private to the service (it does not matter how the service stores it, as long as it is stored). Access to resource data within the service is highly concurrent.
- Activity data
- The activity data lifetime generally bounded by the business activity (think of it as the result of a query); even though the lifetime can be extended for reporting purposis, the format is of course private to the service. It is generally keyed to business activity (e.g. backorder for order #268) and can usually be partitioned
- Service interaction data.
- The only way services should communicate is through service interaction data, the format must be interpretable by all parties, e.g. everyone must be able to read and interpret the “order form”. It is OK if it gets lost because if it gets lost it is OK to send another copy (note that the copy better be an exact replica of the lost one) this is why it is a good idea to keep copies of all messages into and out of the service
The picture below gives you an overview of the characteristics of these different types of data:
Also, you want to think hard about single "ownership" of data (only the data owner is allowed to update the data, even though the data is used by many other parties), in this context you also need to introduce the concept of tentative update requests: if you are not the owner of the data but need to update it, the best you can do is to request the update. In a non transactional environment (frequesnt in these systems) the update might or might not happen; you therefore need to be able to cope either way (potentially introducing compensating logic, manual intervation...)
As you can see (and these are just few examples), there is a significant level of architectural thinking that needs to take place around Data in Connected Systems.
How can you learn more about all this? Well, ask Roger, as it is what his job is all about now 🙂 To be fair to him, he just started on this initiative a few days ago, so let's give him some time; but expect some very cool stuff coming out soon.
Some additional topics I discussed with Roger are:
- Scaling out data
- Asynchronous, Queued Messaging using SQL service broker
- SQL CLR, i.e. when does it make sense to put logic in SQL, when not; by the way last week I was in a presentation where there was a big debate about whether logic should ever live in a database, one person even recommended to never write a single stored proc. It was fun 🙂
Do not hesitate to email me topics you would like covered as part of the data pillar (leaving a comment here or trackback)
Finally, to give something to chew on while we craft some of our magic, here is a very good paper on entity aggregation.
In summary, thinking about services is important but it cannot be at the expense of data. Hopefully this post gave you some motivation to revisit how much time to dedicate on data architecture (beyond "classic" database design) and of course made you realize that there is much more than services and ESBs to worry about when you design, build and deploy your Connected Systems (err. SOA)