Enterprise IT Integration and Data Security in the Canonical Data Model?


One thing that I do is spend a lot of time staring at a single problem: how to make a large number of systems "speak" to one another without creating piles of spaghetti-links and buckets of operational complexity.


 So this past week, I've been thinking about security in the integration layer.


In Microsoft, we have a lot of competing business interests.  One company may be a Microsoft Partner in one channel, a customer in another, and a competitor in a third.  (IBM is a perfect example, as is Hewlett-Packard.  We love these guys.  Honest.  We also compete against them).  To add to the fun, in a vast majority of cases, our Partners compete with each other, and we need to absolutely, positively, with no errors, maintain the confidentiality and trust that our partners have in us.  In order to protect the 'wall' between Microsoft and our partners, and between our partners and each other, in competitive spaces, while still allowing open communication in other spaces, we have some pretty complicated access rules that apply not only to customer access, but also how the account managers in Microsoft, who work on their behalf, can access internal data.  For example, an account manager assigned to work with Dell as an OEM (a Microsoft Employee) cannot see the products that Hewlett Packard has licensed for their OEM division, because he or she may accidentally expose sensitive business information between these fierce competitors.


In this space, we've developed a (patented) security model based on the execution of rules at the point of data access (Expression-Based Access Control, or EBAC).  This allows us to configure some fairly complicated rules to define what kind of data a customer may directly access (or an employee may access on behalf of their customers).  So I'm looking at the EBAC components as well as more traditional Role-based Access Control (RBAC) and thinking about integration.


 What right does any application have to see a particular data element?


This gets sticky. 


I can basically see two models. 


Model 1: The automated components all trust one another to filter access at the service boundary, allowing them to share data amongst themselves freely. 


Model 2: Every request through the system has to be traced to a credential and the data returned in a call depends heavily on the identify of the person instigating the request.


Model 1 is usually considered less secure than model 2.


I disagree. 


I believe that we need a simple and consistent infrastructure for sharing automated data, and that we should move all "restriction" to the edge, where the users live.  This allows the internal systems to have consistently filled, and consistently correct, data elements, regardless of the person who triggered a process.


In real life, we don't restrict data access to the person who initiated a request.  So why do it when we automate the real life processes?  For example, if I go to the bank and ask them to look into a questionable charge on my credit card, there is no doubt that the instigator of the request is me.  However, I do not have access to the financial systems.  A person, acting on my behalf, may begin an inquiry.  That person will have more access than I have.  If they run into a discrepency, they may forward the request to their manager, or an investigator, who has totally different access rights.  If they find identity theft, they may decide to investigate the similarity between this transaction and a transaction on another account, requiring another set of access rights. 


Clearly, restricting this long-running process to the credentials of the person who initiated it would hobble the process. 


So in a SOA infrastructure, what security level should an application have?


Well, I'd say, it depends on how much you trust that application.  Not on how much you trust the people who use it.  Therefore, applications have to be granted a level of trust and have to earn that level somehow.  Perhaps it is through code reviews?  Perhaps through security hardnening processes or network provisioning?  Regardless, the point is that the application, itself, is an actor. It needs its own level of security and access, based on its needs, seperate from the people that it is acting on behalf of.


And how do you manage that?  Do you assign an application access to a specific database?  Microsoft IT has thousands of databases, and thousands of applications.  The cartesian product alone is enough to make your head spin.  Who wants to maintain a list of millions of data items?  Not me.


No, I'd say that you grant access for an application against a Data Subject Area.  A Data Subject Area is an abstraction.  It is the notion of the data as an entity that exists "anywhere" in the enterprise in a generic sense.  For example: A data subject area may be "invoice" and it covers all the systems that create or manage invoices.  This is most clear in the Canonical Data Model, where the invoice entity only appears once.


Since applications should only integrate and share information using the entities of the canonical data model, would it not, therefore, make sense to align security access to the canonical data elements as well?


I'll continue to think on this, but this is the direction I'm heading with respect to "data security in the cloud."


Your feedback is welcome and encouraged.

Comments (8)

  1. I think model two allows to satisfy compliancy regulations better than model one: tracing who did what and when. It has to do with identity management. If a user X uses service A, which invokes service B, which invokes service C, you might want to know that service C was initiated by user X and followed the path A-B-C, just because you have to by law.

    Once implemented such an identity managemet infrastrcuture you might use it to apply access rules based on it, e.g. permissions based on the path A-B-C.

    Not only users have identities, but services as well. Every user and service add their own credentials to the message in the path, creating a SAML credentials chain in the message. It makes you services more autonomous, because it doesn’t matter where the service resides. Access – not only to data, but also to services – can then controlled by federated identity management.

    I planned to publish an posting on this pattern at http://soa-eda.blogspot.com. I hope I will shortly find som time to translate this draft (among many others) from Dutch into English.

    Why didn’t the world accept Dutch as an universal language in stead of English. Grrrr… šŸ˜‰

  2. NickMalik says:

    Hi Jack,

    For one thing, I’d seperate the concerns of auditing from access control.  It is useful to know who-did-what after the fact without having to prevent a system from doing something that I cannot do on my behalf.

    I think, in computing, we have forced our computers to be the most basic servant of the user, with a focus on "do what I tell you to do."  However, as system complexity rises and the maturity of the business rules grows, we will come to the point where systems have to be treated as full agents, having more privilege than the user who instigated a request.

    I agree, though, that the auditing requirements do not disappear in either model.  The message must CONTAIN the information about who is doing what.  I just not convinced that is the best way to control access to controlled data INSIDE the infrastructure.  At the edge, yes.  In the middle of the system, no.

    — N

  3. Udi Dahan says:

    Agreed on the issue of auditing as separate from access control.

    Also, you bring up a great point as to the ramifications of viewing databases and the applications that access them as separate. I consider SOA, in terms of services which encapsulate their data/databases entirely one benefit of the model. This also makes security easier.

    One system I’m consulting on actually has each Service defined as a "user". Nothing but goodness has come of that. Also, a single "data subject area" would fall under the responsibility of a single service.

    Great post!

  4. NickMalik says:

    @Udi,

    as far as the system you are consulting on:

    "a single data subject area would fall under the responsibility of a single service"

    Excellent!

    It is so good to hear that good design does exist in production, and that good people are succeeding in this space.  It is such a shift for many in IT.

    Thanks for the feedback.

  5. In a SOA, to be effective, we need to share both data and events. Events, as I have discussed before

  6. In a SOA, to be effective, we need to share both data and events. Events, as I have discussed before

  7. Curt Devlin says:

    Passing messages is a hallmark of service-orientation, but we usually distinguish events as a special subset of message passing patterns. A notification, for example, is a simplex message exchange pattern; but even simplex notifications can be further subdivide into unicast and multicast. Certain messages should only be sent to me, while others may be sent to many. I think we are oversimplifying, if we think that authorization only pertains to certain phases of these event patterns.

    General policy may permit me to receive a certain class of notifications, but deny me access to specific messages in that class for other reasons. For example, if I am Director of Operations, I should receive all termination notices so that I can revoke the appropriate permissionsā€”UNLESS the notice is about a person above me in my reporting chain. In this scenario, Iā€™m an authorized subscriber to termination notices, but denied specific messages from that subscription for other reasons.

    I can easily envision scenarios that would require authorization at many different junctures within the event patterns. At a minimum, both publication and subscription could require authorization in some cases. I may be permitted to a class of messages only a certain set of subscribers, for instance. Mechanically, some event patterns involve several phases of sub-events.

    Some events engender other events in a chain. However, downstream events are likely to have a completely different set of access control requirements. I am allowed to publish a message only to an editor or group of editors. Perhaps editors are allowed to publish specific distributors who in turn publish to ā€œpublicā€ subscribers. In effect, a collection of publications and subscriptions are part of a larger workflow pattern. There are abundant examples of this family of authorization scenarios in the real world. For instance, some manufacturers are not permitted to publish information about their products because this causes channel conflict with their retailers. Similar policy requirements will pop up electronic event publishing as well.

    Events furnish a really rich and interesting area for exploring authorization. We should resist the temptation to rush to conclusion at the risk of oversimplifying the problem domainā€”at least until we have determined that supporting the more complex cases is highly intractable. I think authorization claims and secure tokens can and will be packaged with event messages but there are other, equally important checkpoints within this family of patterns that are also valuable and probably required for authorization.

  8. NickMalik says:

    @Curt,

    I’m confused about your example.  If I am a Director of Operations responsible for revoking credentials when a network user is terminated, then I would need to react MORE QUICKLY on the termination of an executive, given the higher level of access to key business data.  (This was certainly the model in one of my employers).  In fact, I would expect that the Director himself would need to collect the laptop and secure the area to insure no thumbdrives exist with key business plans.  True: in some cases, heirarchy affects role, but that example is not a good one.

    That said, the system itself needs to decide if the message is relevant for the system to decode.  Therefore, the event notification has to contain as little information as possible… just enough to know the kind of event and the context that is placed upon it.  That allows the decision logic to be independent of the security infrastructure.  

    To support your example, I’d say that the event message would go out as "event: employee termination, access level: executive, publicity level: private, termination id: 82344321" and then the systems responsible for revoking credentials can ask for information on the termination itself.  A portal that shows employee profiles may not care about it until another event appears shoing the publicity level to be public.  

    I would not have the permissions handled at the infrastructure level, but rather at the endpoint level.  The goal is to keep the logic out of the infrastructure.  Therefore, there is no restriction against subscription to an event.  There would only be restriction against the request for more information.

    I also disagree with the tiered model of publication that you appear to be suggesting.  I have no problem with publishing an event that only editors care about.  I have a big problem with saying that I only have the right to publish to editors.  I am not publishing to editors.  I am publishing to systems that act on behalf of editors.  A system that shares events with the public would only act on messages that should be shared with the public.  It is not up to my system, or the editor’s system, to make that decision.  If it were, then our business rules would be tightly coupled across systems, which is the antithesis of SOA.

    I have not and will not close my mind to these ideas.  However, my overriding concern here is to make things as simple as possible but no simpler.  Assuming complexity is a trap.  I am concerned that your response appears to take the other tack: assume complexity until you can prove that simplicity works.  That is not a thought process that I normally subscribe to.

Skip to main content