SOA is like the Night Sky…

I frequently find that people have a different perspective of computing than I do.  When I think about it, I am struck by the independence and separation that different systems (i.e. independent services) have from each other.  In many ways, this reminds me of the night sky.

Imagine a single service in which you have a bunch of code and a bunch of data.  In this discussion, I am using the word service as a bar from the previous blog.  By that, I mean the aspect of a service dealing with explicit boundaries and autonomy.  Being a service with explicit boundaries, there is a collection of data completely encapsulated within it.  This is a set of tables that is owned by this service (bar) and not any other service.  Under no circumstances do we have a transaction (i.e. an atomic database transaction with ACID properties) that spans the data contained in these tables and stuff outside of this service (or bar).

So, we can perform an ACID transaction against the fully contained data (inside the fully contained set of tables) and that’s about it.  We also have the ability to transactionally consume incoming messages and transactionally enqueue outgoing messages.   What we cannot do is directly examine or directly modify the data in any other service.  We must rely on messaging to humbly request that the other service supply an image of some data (again via messaging) or that it accept our humble request that it consider the possibility of performing a business operation that may result in some change to the remote services data.  Sometimes a partner will regularly send us messages containing images of some data that it wishes to export.  These images of data are, of course, images of the way the data looked at the time of transmission to us.  The data has since been unlocked and may have changed since it was transmitted.

So let’s examine this again.  The act of sending a message by the remote service involves its unlocking of the records containing the data being transmitted.  The message to our service is prepared and as it is enqueued for sending, the data being copied is unlocked.  This is a natural consequence of the fact that the two services do not share transactions.  It is impossible to see the current value of a distant record!  Just as it is impossible to know if the sun blew up 5 minutes ago (since it takes 8 minutes to know at the speed of light), it is impossible to see the current value of a record in a distant service.  You can see how the sun was 8 minutes ago and you can see the value of the record at the time it was unlocked.  This is a new phenomenon that was not part of our psyche when we were just plain old building mainframe apps or client-server apps.  In those days, we could see God’s Truth about the state of the record.  We began a transaction, looked at the record, acted on the information gleaned by looking at the record, and committed the transaction (which unlocked the records).

Computing within an SOA is like the night sky.  We know the state of the stuff within easy reach (we can lock that stuff that is in our service).  Just like when we look up in the sky and see the light from this star that is 10 years old (the star is 10 light years away) and the light from that star is 10,000 years old, we look at the messages received from remote services and consider what they used to look like.  Based upon this stale (and of various vintage) information, and the stuff we know about the local data we have, we perform a transaction.  This transaction makes changes to the local data and enqueues outgoing messages.  This is our star shining forth its light to arrive later at our partner services.

In summary, computing in SOA is like the night sky.  We act on our local information and the information we know about what the other services used to look like.  While this is a different computing model, it is the only possible computing model in a loosely-coupled and distributed world.



Comments (13)

  1. Sean McGrath says:

    Excellent and very useful analogy. I tend to use a service-as-registrar analogy to illustrate similar ideas.

    You ask the registrar a question and you get an anwser. The answer you get is what the registrar had as "current data" at the time it processed your request. Its notion of "current data" is independent of any enqueued transactions (pending updates) that may change the current value.

    By the time you get the current value, the registrars notion of current value may well have changed but c’est la vie in a distributed world.


    Sean McGrath

  2. Mike Julier says:

    If knowing the "God’s Truth" is of value, the service providing the data should be designed to allow for (timed out) locking of that value across messages. So, there would be four queues of requests on the data (read, read w/ lock, write, unlock). As long as we agree that I may unlock the data without your permission (but will indicate I did so as a response to your "unlock" message)

  3. Pat, what you are saying is very clear. Please could you comment on what appears in a related MS blog by Ramkumar Kothandaraman’s which seems contradictory:


    Marcus Mac Innes

  4. John says:

    This seems overly simplified to me. You say ‘data being copied is unlocked’.

    In my mind I have always considered that simply by having data I have an implicit read-lock on that data. This is an optomistic read-lock, but a lock nevertheless. If I am not in the sole execution context (basically ‘logical thread’) that manages this data my optomistic read-lock can become stale any moment after I have received it. In short, *data is a read-lock*. It’s what you know and you have to act on it until you know it’s now obsolete and you have been wasting your time.

    You say the ‘act of sending a message by the remote service involves its unlocking of the records containing the data being transmitted’ but this is not true, it is issuing an optomistic read-lock. Optomistic concurrency is not a new idea.

    If I was providing a service-interface (foo) that only supported read operations then I guess I could disregard these locks and not care about the fact that my clients data is becoming stale. Most real world services (bars) will also receive requests for alterations to data via messages on their service-interfaces (foos). Pessimistic locking doesn’t strike me as feasible in any way shape or form for a distributed system (where a ‘distributed system’ is over a network where one node can fail (or degrade) independantly of the entire system), but clients of my service must know what data I have before they can send me a request to update it. If their read-lock has become stale I must fail them with a concurrency error, and then force them to begin again or move them into the ‘merge’ process. This is not a new idea, and in my view it doesn’t fall outside the scope of ‘transaction management’. Is SOA simply a new vocabulary? What’s wrong with the one that we already have? Why is ‘SOA like the night sky’? Isn’t it just like optomistic concurrency?

    Since a client can take these read-locks, they are implicitly involved in a distributed transaction whenever they hold data they requested from a service where there might be some intention to post a message to the service based on the contents of that data. This type of distributed transaction’s ACID principles still apply, but there is always the risk of processing or viewing stale data (because we don’t serialize access to data). These ideas don’t strike me as really new or ground breaking. A paradigm where a service maintained a record of state known to all clients and managed a message dispatch system that let them know that ‘the sun just blew up’ (or more likely ‘data you hold a read-lock on just got modified’) would be, but optomisitic concurrency is not.

    By the way, if the Sun blows up it is telling all its clients as soon as it can about that change in state (all practical latency aside). By virtue of time and motion there isn’t such a thing as ‘real-time’ when you have more than one execution context, but there is ‘as close to real-time as possible’ (invariably race conditions will need to be dealt with). I’m pretty sure that SOA doesn’t imply that all services will notify all clients about a change in state that they would have an interest in at the speed of light (but the Sun would if it blew up (and I didn’t even ask for a read-lock, I just get one, it streams its state at me as fast as it can)).

    Also, I’m still not really comfortable with these exploding layers of abstraction that all do the same thing. For example, you define bar as: "a collection of data and logic that is completely isolated from anything else except through incoming messages. A bar has explicit boundaries and is autonomous. Typically (i.e. in real applications), a bar is implemented as a bunch of code surrounding a set of tables in a single database."

    That sounds like a function to me. Oh, and a class. Oh, and an API. Oh, and a process. Oh, and an operating system. But apparently it’s this new and ‘different’ thing..? Concurrency has been an issue with all types of messaging paradigms with multiple execution contexts. The context could be threads, local processes, remote processes and beyond (into real life if you like), basically any situation where you can lose a deterministic order of events.

    Schema, contract and policy also seem to me to have been around for a very long time, at many different levels. Didn’t the word for this used to be ‘type’?

    If SOA is going to be thrown around as the flavour of the month, I reckon it’d be worth admitting that it was something far more specific. Give it real concrete bounds, not vauge wishy-washy ones that could just as eaily describe how my class relies on the ‘services’ of a function, my API relies on the ‘services’ of a class, my process relies on the ‘services’ of an API, my OS relies on the ‘services’ of a process, etc.

    Rebranding old ideas isn’t progress. It’s marketing.

    If service is foo, and service-interface is bar, then SOA is snafu.

  5. A very interesting post by Microsoft’s Pat Helland that emphasizes the difference between exposing internal data transactionally vs. message-based. I tried to make some of the same points (I think) some time ago.   [via Savas’ blag]…

  6. Hartmut Wilms says:

    Very good analogy, which explains the "autonomous" tenet of SOA very well. Thanks. However, there are some open issues for me.

    1.) "So, we can perform an ACID transaction against the fully contained data (inside the fully contained set of tables) and that’s about it." I’m not really sure what you mean by transactions must not be shared by services. In my opinion some services must share transactions, i.e. two business services take part in a single transaction (they share a common transaction context), although their contained data sets are disjoint. Maybe this is only a misunderstanding, but in my view distributed transactions also have to include two or more services (calls). At least a service should be able to consult another service within a single transaction …

    2.) I am very interested in your (missing) response to Marcus Mac Innes’ post.

    Separating service data is a very tough job. Sometimes you may be forced to (completely) redesign your services, because data, which at first seemed to be unrelated, is in fact related to each other … The notion of autonomous business services appeals to me. In my view services are described by their behaviour – their SERVICE – rather than their data. By the way this is the most common mistake in object-oriented development: Objects or classes are very often modeled by their data. The service offered by a service 🙂 may share its data with other services by a common (logic) data model. This could get messy, you have a point there. There maybe a solution to this problem, which adheres to your opinion as well.

    In my current project we have divided services into two main kinds:

    a) Refdata services, which offer solemly business requests (read only) services. These requests return reference data, which is shared by all other services. This data is represented by unconnected "value objects" or XML documents.

    b) Business services, which implement business workflows or actions of those workflows or use cases. These services do not share their data with other services, i.e. no other service may directly request or alter these data.

    I do not know if this solution will work in every case …

  7. Mark O says:

    This brings up a lot of interesting questions when it is extended to the user interface. If you really can’t tell what the state is on the other side, how to you inform the user of what they can and cannot do. Enabling menu items based on some state reported by the service is a problem.

  8. Joe B says:

    Very nice analogy. I’ve been thinking about all the stuff that can go wrong with compensation. (Double Faults, Subsytems withing the external system already running with the updated data, etc)

    Kind of like rolling back a supernova….

  9. <p>
    I find that the terms loosely-coupled and asynchronous systems as scalable system structuring concepts are interpreted by many developers as ‘using non-blocking techniques’ to build the individual software components. These are somewhat related but in essence different concepts.
    Blocking and non-blocking in communication are programming paradigms, not essential distributed systems concepts. All communication as we know today is non-blocking or asynchronous. Any notion of blocking or synchrony is a programming construct that was introduced as a developer friendly programming paradigm to handle request/response patterns or to achieve guarantees such as reliability or stability. But in reality, at the lowest level messages are always non-blocking.
    Both blocking and non-blocking programming styles have their applications, and the choice is often a matter of design preferences combined with the limitations of the target operating system and its thread packages. As shown in <a href="">Capriccio</a&gt; you can have many threads with blocking operations and still have a well controlled system. And even sometimes it is just easier to fork of a process if your thread system is not good enough, as for example in the <a href="">Flash web server</a> there are helper processes that do the path parsing using lstat systems call, which is only available in a blocking fashion. It was easier to construct this in a helper process, than to convert it into the pure event driven asynchronous structure of the main server.
    The clear notion that all operations are in essence asynchronous is obscured by the fact that some operating systems only provide synchronous interfaces to their asynchronous core. If your OS only gives you a blocking connect call, youre seriously handicapped in making good design decisions. Often we see then that developers construct an non-blocking layer over the blocking systems calls, just to get back to the true asynchronous events as they also occurred in the core of the operation system.
    Whether you have an explicit request producer and a response consumer, or whether you have layered an synchronous software layer on top of this with the help of some parallelism, this is a programming decision. And the level of synchrony can be completely application defined: For example consider the case where a document get submitted to the printer; whether the initiator is not blocked at all, is blocked until printer accepted the job, or blocked until the printer has completed the job, is entirely up to the designer/programmer, there is nothing in the communication with the printer that forces the caller to wait.
    In my view there is however a serious disadvantage to using synchronous programming techniques, even if they are just layered over a purely asynchronous system and enough parallelism is used to ensure that the overall application will never block. There is the misguided desire to make remote and local operations transparent, which is possible if you program using synchronous techniques. If you want to pretend that a remote call is identical to a local call than you run into trouble, because there are performance and failure scenarios that not covered if you are not aware distributed nature of the application. So for me the main disadvantage is that synchronous operations tend to obscure the distributed aspects of a system. I have written about this before (see "<a href="">six misconceptions about reliable distributed computing</a>") but it also is the conclusion of papers such as "<a href="">Note on Distributed Computing</a>" by Jim Waldo and friends.
    The appeal of asynchronous, message oriented middleware is that it makes distribution explicit. And if you allow the asynchronous message paradigm to be used as a structuring methodology for all of your application than it is easy to make all stakeholders be aware that this is a distributed application, not a local application with distribution hidden under the covers.
    Sometimes your process will just have to wait for another (remote) resource to complete an operation, there is no way around that. You can use multiple threads + blocking call or a "send message + back to eventloop + process response" to do "check_creditcard", both will work. Most important is that you are aware of what is happening under the covers when you issue the call.
    But most important: they are programming paradigms, not networking technologies or immutable distributed systems components. And a problem is that often we let our programming dictate the way we see functionality or component interactions.
    To build scalable systems we have to step away from what our programming preferences dictate us and involve ourselves in a thinking adventure about how components really need to and can interact on a global scale. I have read many papers and articles on this topic, but I consider Pat Helland almost be the only person who really graps what the complexity of enterprise systems design is going to by like if you really need to scale systems. See for example his "<a href="">SOA is like the night sky</a>" article. This is the kind of visionary thinking that should motivate all of us to drop our RPC or multicast battle axis’ and start thinking about how large systems really should look like.