Enterprise IT Integration and Data Security in the Canonical Data Model?

One thing that I do is spend a lot of time staring at a single problem: how to make a large number of systems "speak" to one another without creating piles of spaghetti-links and buckets of operational complexity.

 So this past week, I've been thinking about security in the integration layer.

In Microsoft, we have a lot of competing business interests.  One company may be a Microsoft Partner in one channel, a customer in another, and a competitor in a third.  (IBM is a perfect example, as is Hewlett-Packard.  We love these guys.  Honest.  We also compete against them).  To add to the fun, in a vast majority of cases, our Partners compete with each other, and we need to absolutely, positively, with no errors, maintain the confidentiality and trust that our partners have in us.  In order to protect the 'wall' between Microsoft and our partners, and between our partners and each other, in competitive spaces, while still allowing open communication in other spaces, we have some pretty complicated access rules that apply not only to customer access, but also how the account managers in Microsoft, who work on their behalf, can access internal data.  For example, an account manager assigned to work with Dell as an OEM (a Microsoft Employee) cannot see the products that Hewlett Packard has licensed for their OEM division, because he or she may accidentally expose sensitive business information between these fierce competitors.

In this space, we've developed a (patented) security model based on the execution of rules at the point of data access (Expression-Based Access Control, or EBAC).  This allows us to configure some fairly complicated rules to define what kind of data a customer may directly access (or an employee may access on behalf of their customers).  So I'm looking at the EBAC components as well as more traditional Role-based Access Control (RBAC) and thinking about integration.

 What right does any application have to see a particular data element?

This gets sticky. 

I can basically see two models. 

Model 1: The automated components all trust one another to filter access at the service boundary, allowing them to share data amongst themselves freely. 

Model 2: Every request through the system has to be traced to a credential and the data returned in a call depends heavily on the identify of the person instigating the request.

Model 1 is usually considered less secure than model 2.

I disagree. 

I believe that we need a simple and consistent infrastructure for sharing automated data, and that we should move all "restriction" to the edge, where the users live.  This allows the internal systems to have consistently filled, and consistently correct, data elements, regardless of the person who triggered a process.

In real life, we don't restrict data access to the person who initiated a request.  So why do it when we automate the real life processes?  For example, if I go to the bank and ask them to look into a questionable charge on my credit card, there is no doubt that the instigator of the request is me.  However, I do not have access to the financial systems.  A person, acting on my behalf, may begin an inquiry.  That person will have more access than I have.  If they run into a discrepency, they may forward the request to their manager, or an investigator, who has totally different access rights.  If they find identity theft, they may decide to investigate the similarity between this transaction and a transaction on another account, requiring another set of access rights. 

Clearly, restricting this long-running process to the credentials of the person who initiated it would hobble the process. 

So in a SOA infrastructure, what security level should an application have?

Well, I'd say, it depends on how much you trust that application.  Not on how much you trust the people who use it.  Therefore, applications have to be granted a level of trust and have to earn that level somehow.  Perhaps it is through code reviews?  Perhaps through security hardnening processes or network provisioning?  Regardless, the point is that the application, itself, is an actor. It needs its own level of security and access, based on its needs, seperate from the people that it is acting on behalf of.

And how do you manage that?  Do you assign an application access to a specific database?  Microsoft IT has thousands of databases, and thousands of applications.  The cartesian product alone is enough to make your head spin.  Who wants to maintain a list of millions of data items?  Not me.

No, I'd say that you grant access for an application against a Data Subject Area.  A Data Subject Area is an abstraction.  It is the notion of the data as an entity that exists "anywhere" in the enterprise in a generic sense.  For example: A data subject area may be "invoice" and it covers all the systems that create or manage invoices.  This is most clear in the Canonical Data Model, where the invoice entity only appears once.

Since applications should only integrate and share information using the entities of the canonical data model, would it not, therefore, make sense to align security access to the canonical data elements as well?

I'll continue to think on this, but this is the direction I'm heading with respect to "data security in the cloud."

Your feedback is welcome and encouraged.