More Discussion of SOA is like the Night Sky…


I received some thoughtful commentary from someone named John and I thought I would share his comments and some of my responses with you all.  I’m still trying to get used to Blogging and don’t know how to give better attribution to John than his first name.  Here goes:


 


Sender: John


====================================


 


re: SOA is like the Night Sky…


 


[John] This seems overly simplified to me. You say ‘data being copied is unlocked’.


 


[John] In my mind I have always considered that simply by having data I have an implicit read-lock on that data. This is an optimistic read-lock, but a lock nevertheless. If I am not in the sole execution context (basically ‘logical thread’) that manages this data my optimistic read-lock can become stale any moment after I have received it. In short, *data is a read-lock*. It’s what you know and you have to act on it until you know it’s now obsolete and you have been wasting your time.


 


[Pat] This is a great question (or comment).  The issue cuts to what you believe about a service and its relationship to the outside world.  My assertion is that a distrusting relationship with autonomy WILL NOT include data updates of the backend database (even with optimistic concurrency control).  A distrusting service will insist on verifying the behavior implied by incoming work through its business logic.  I’m fully aware that optimistic concurrency control can be made to work across distances and without holding locks.  The issue is that this is unsafe behavior.


 


[Pat] If I (a service) value my data, I’m not going to let others change it.  This is about independence, autonomy, and trust.  The IRS does not let me perform optimistic concurrency control against their backend database when I post my tax return.  The premise behind services is a loose coupling and distrust between the participating services.  In my opinion, this means that a read/write semantic against backend data is completely unacceptable.


 


[John] You say the ‘act of sending a message by the remote service involves its unlocking of the records containing the data being transmitted’ but this is not true, it is issuing an optimistic read-lock. Optimistic concurrency is not a new idea.


 


[Pat] Again, it’s about the semantics available across distrusting boundaries, not the ability to use optimistic concurrency control.


 


[John] If I was providing a service-interface (foo) that only supported read operations then I guess I could disregard these locks and not care about the fact that my client’s data is becoming stale. Most real world services (bars) will also receive requests for alterations to data via messages on their service-interfaces (foos). Pessimistic locking doesn’t strike me as feasible in any way shape or form for a distributed system (where a ‘distributed system’ is over a network where one node can fail (or degrade) independantly of the entire system), but clients of my service must know what data I have before they can send me a request to update it. If their read-lock has become stale I must fail them with a concurrency error, and then force them to begin again or move them into the ‘merge’ process. This is not a new idea, and in my view it doesn’t fall outside the scope of ‘transaction management’. Is SOA simply a new vocabulary? What’s wrong with the one that we already have? Why is ‘SOA like the night sky’? Isn’t it just like optimistic concurrency?


 


[Pat] If this were about optimistic concurrency control, we would be pursuing a reincarnation of the same behavior.  In that case, you would be correct that it is simply inventing a new vocabulary.  It is not about the same behavior, though.


 


[Pat] SOA is about interacting with a business-function semantic.  It is also about the assumption that when you do your business function, it is only connected via messaging.  This leads us to a style of interaction that is reminiscent of the way we interact with businesses.  I may place a hotel reservation (and, perhaps, later on cancel that reservation).  I don’t fiddle with the hotel’s backend database records.


 


[Pat] This is why there’s a lot of excitement about SOA.  While it has been done before we came up with a new name (e.g. EDI, MQ, etc), it has not been worked on with the same intensity and with the same hope for broad impact.  What you posit for interaction (with optimistic concurrency control over direct access to the partner’s data) is definitely not SOA.


 


[John] Since a client can take these read-locks, they are implicitly involved in a distributed transaction whenever they hold data they requested from a service where there might be some intention to post a message to the service based on the contents of that data. This type of distributed transaction’s ACID principles still apply, but there is always the risk of processing or viewing stale data (because we don’t serialize access to data). These ideas don’t strike me as really new or ground breaking. A paradigm where a service maintained a record of state known to all clients and managed a message dispatch system that let them know that ‘the sun just blew up’ (or more likely ‘data you hold a read-lock on just got modified’) would be, but optimistic concurrency is not.


 


[Pat] Same comment as above.  The interaction is not about record reading and is not about ACID transactions that span the services (“bar”s).


 


[John] By the way, if the Sun blows up it is telling all its clients as soon as it can about that change in state (all practical latency aside). By virtue of time and motion there isn’t such a thing as ‘real-time’ when you have more than one execution context, but there is ‘as close to real-time as possible’ (invariably race conditions will need to be dealt with). I’m pretty sure that SOA doesn’t imply that all services will notify all clients about a change in state that they would have an interest in at the speed of light (but the Sun would if it blew up (and I didn’t even ask for a read-lock, I just get one, it streams its state at me as fast as it can)).


 


[Pat] In fact, I am trying to point out that SOA is about looseness between the services.  The behavior of a collection of services should be identical even if one of them goes up and down intermittently (of course with the exception that the responsiveness of the collections of services is impacted).  The use of queues for the messages that connect the services allows for a great deal of tolerance of intermittent availability.  Amazon.Com (and most scalable web sites) have a scale-out front end and a centralized back end.  Browsing and shopping happen on the front end.  When you push SUBMIT, a message is enqueued for delivery to the back end system.  Normally, you get an email from the back end system a few seconds later.  Sometimes the back end system is down for a while and you get the email in an hour or so.  You still get your books.


 


[Pat] So, one of the ideas is to tolerate the fact that these systems are in different time domains as much as possible.  That is the opposite of believing it is at the speed of light. 


 


[John] Also, I’m still not really comfortable with these exploding layers of abstraction that all do the same thing. For example, you define bar as: “a collection of data and logic that is completely isolated from anything else except through incoming messages.  A bar has explicit boundaries and is autonomous.  Typically (i.e. in real applications), a bar is implemented as a bunch of code surrounding a set of tables in a single database.”


 


[John] That sounds like a function to me. Oh, and a class. Oh, and an API. Oh, and a process. Oh, and an operating system. But apparently it’s this new and ‘different’ thing..? Concurrency has been an issue with all types of messaging paradigms with multiple execution contexts. The context could be threads, local processes, remote processes and beyond (into real life if you like), basically any situation where you can lose a deterministic order of events.


 


[Pat] Oh, would that this were a function, a class, or (dare I say it) a component.  When was the last time you saw a system in which all of the components minded their own business and didn’t fiddle with the same data that other components fiddled with?  When I look at enterprise applications that manage data stored in the database, I see lots of different components fiddling with the same data.  Furthermore, you have to look pretty far to actually be able to group a collection of database tables and a collection of code into a chunk where no transaction spans this chunk and that chunk.


 


[Pat] Some of this may be a community perspective.  Language and object folks don’t think about the disjointedness of the database data that must exist to have true encapsulation.  They think about member variables and avoiding the use of globals (which is a fine thing).  If, however, this component and that component fiddle with the same database records, what kind of encapsulation does your component have?


 


[John] Schema, contract and policy also seem to me to have been around for a very long time, at many different levels. Didn’t the word for this used to be ‘type’?


 


[Pat] Not an unreasonable perspective but missing a few things.  As we are trying to hook together our services, we do not want to be as chatty as we have been with component-based systems.  It becomes important to package up a big request and ship it off to the services with the possibility of as much looseness and independence in the definition as possible.  I totally agree that schema is very much like types and, indeed, very much like interfaces.  Interfaces are almost always finer-grained and chattier.  Interfaces did not include notions comparable to contract or policy, though.  The interface did not say the allowable order of method calls.  Similarly, interfaces and types did not attempt to address the domain that policy is targeting.


 


[John] If SOA is going to be thrown around as the flavour of the month, I reckon it’d be worth admitting that it was something far more specific. Give it real concrete bounds, not vague wishy-washy ones that could just as easily describe how my class relies on the ‘services’ of a function, my API relies on the ‘services’ of a class, my process relies on the ‘services’ of an API, my OS relies on the ‘services’ of a process, etc.


 


[John] Rebranding old ideas isn’t progress. It’s marketing.


 


[Pat] I think there is a difference.  I will totally grant you that there is ambiguity about the crisp delineation between what a service is and what a component is amongst the SOA crowd.  We are debating these issues and trying to form a consensus just as the object folks went through some churn during their formative times.


 


[Pat] It is my opinion that there is a substantive difference and, hence, this is more than simply rebranding.  I would be pleased to have your commentary and to do my best to address anything you have to say in this forum.  I am pleased as can be about your firmness in expressing you opinions and concerns!


 


[John] If service is foo, and service-interface is bar, then SOA is snafu.


 


[Pat] I had intended that a bar be a service and a foo be a service-interface in my earlier discussion.  As I said, these terms are subject to discussion and redefinition.  Still, I don’t think that’s a big part of your concern compared to the issues discussed above.


 


[Pat] Thanks, again, for your vociferous comments!  Please let me know what you think!


 


Source: http://weblogs.asp.net/pathelland/archive/2004/03/18/91825.aspx#92268

Comments (31)

  1. Andrew D says:

    > [John] Rebranding old ideas isn’t progress. It’s marketing.

    John, I enjoyed you comments – this is not a flame.

    Authors of any great work are just re-arranging the words that have been in a dictionary for centuries. It is the fresh arrangement of these words that open our minds to new thoughts.

    Sometimes even giving a new name to a collection of old ideas is still progress. For example the fiefdoms presentations stated things we all knew – but didn’t know we knew – or at least didn’t know how to join the dots. The ideas presented there I use every day.

  2. John says:

    If you want the executive summary: "SOA is like a DBMS without pessimistic concurrency".

    > I received some thoughtful commentary from someone named John and I thought I would share his comments and some of my responses with you all. I’m still trying to get used to Blogging and don’t know how to give better attribution to John than his first name. <

    Hi there. Yep, 202.59.100.138, that’s me.. I wouldn’t have thought it was general etiquette to publish my IP address, but then I don’t think SOA is like the night sky either.. bit late for worrying about it now.

    I like to pretend that I’m invisible. I like the idea of just having a discussion based on its merits, without the need to bring a whole lot of irrelevant stuff into the picture, like ‘who I am’ for example. Or maybe I’m worried that I’ll piss people off (heh, it’s a gift) or say something dumb and you’ll hold it against me later, or maybe I’ve just been in a bit of a nihilist mood lately, but if failing to identify myself properly is rude, or silly, or likely to start a big mystery, then: nslookup blog.jj5.net

    I’m pretty interested in this conversation though and I’ve been waiting with baited breath for a response.. 🙂

    > My assertion is that a distrusting relationship with autonomy WILL NOT include data updates of the backend database (even with optimistic concurrency control). <

    Then your service is necessarily read-only. My assertion remains: data is a read-lock.

    The reality tends to be that if you have data you need to update it. Unless your service is generating that data it needs to source it from somewhere. In order to get data from somewhere you have to leave your execution context (when I say this, I’m talking about ‘serial execution’ or ‘sequential execution’, I’d say thread, but I want a term more generic than ‘thread’, at the end of the day even a thread is only an abstracted execution context, down in the hardware locking, blocking and switching is maintaining the integrity of this ‘abstract serial execution context’ for you, I pulled ‘execution context’ out of the air, but I think it’s a fair enough term), meaning that you need a method for dealing with concurrency. There are really only two ways to deal with concurrency. Optimistic and Pessimistic. I don’t want to get hung up on typical ways for implementing these types of concurrency mechanisms because I’m sure we all know about them, but in general pessimistic concurrency serializes access to data (i.e. one at a time) optimistic concurrency allows for non-serialized access to data (i.e. many execution contexts can be reading and modifying the same data at the same time).

    Obviously if multiple execution contexts are modifying the ‘same data at the same time’ they are working on copies of the data. There is still "God’s Truth" about the data, but it is (or at least *should be*) well and truly protected by your service. The only execution context that can know "God’s Truth" about the data is the service, and it is responsible for serializing access with a form of pessimistic concurrency (locking, queuing, etc) to maintain its integrity. So, at the basic level a DBMS is a service. You pass it messages (SQL statements) and it ‘serializes’ (as in ensures the ACID principles are met, usually by guaranteeing ‘serial’ processing of queued messages at an abstract level) those for you. The problem with a DBMS by itself is that (not for lack of trying for decades) it doesn’t provide a rich or secure enough service just by itself. It’s ‘easier’ to wrap the DBMS in another ‘service’ where business logic can be implemented imperatively, typical stuff like application level authentication, authorisation, complex validation, etc. A DBMS is essentially a ‘multi-purpose’ environment, and this is a ‘bad thing’ because developers or users can easily undermine attempts at maintaining data integrity (when your DBMS can’t (or doesn’t) model all possible constraints), thus we create a ‘service’.

    In my view, a service is just an ‘application server’. It receives messages and processes them. Simple right? All it does is ‘restrict’ what you could do on the DBMS.

    > A distrusting service will insist on verifying the behaviour implied by incoming work through its business logic. I’m fully aware that optimistic concurrency control can be made to work across distances and without holding locks. The issue is that this is unsafe behaviour. <

    Wait a minute. You said ‘optimistic concurrency control can be made to work across distances and without holding locks’. But I said ‘data is a read-lock’. They are contradictory assertions. I’m not willing to change my mind. So, we have a vocabulary problem. If we aren’t talking the same language we can’t communicate (this is actually a very hard philosophical problem that I don’t believe can be adequately overcome, so let’s not over analyse this just now, lets just assume it is possible for us to have meaningful communication (whatever you think that means)).

    When you think of a ‘lock’ you are thinking of pessimistic concurrency. That’s fine, because that’s a normal and typical way to view a lock. Such a lock is ‘explicit’. You acquire the lock, and once you have you know that you are the only holder of that lock. While you have the lock you can modify the data. The thing about a ‘pessimistic lock’ is that it is ‘implicit’ and you have it as soon as you have a copy of some data. If that data doesn’t come with a mechanism for determining the ‘version’ of the data then you can’t have optimistic concurrency. You need a method of knowing when you lose a race with another execution context so you can deal with the problem, this is different to trying to avoid getting into a race in the first place, but it is still ‘locking’ in the sense that it is a mechanism for dealing with concurrency.

    I guess my argument is that you are using the term ‘lock’ to mean a pessimistic lock, and don’t want to concede my point that ‘data is a read-lock’. In my view I’m right, so change your mind or change mine (the latter might be quite difficult). 😉

    > If I (a service) value my data, I’m not going to let others change it. <

    What you mean is: If I value my data, I’m not going to let others change it without going through me.

    Data still needs to be changed.

    > This is about independence, autonomy, and trust. <

    At the end of the day all it’s ever really about is data integrity.

    > The IRS does not let me perform optimistic concurrency control against their backend database when I post my tax return. <

    Posting your tax return is in many ways a ‘write-only’ operation. You provide your primary key (in Australia that’s a Tax File Number) and a whole heap of ‘new’ data. Say I posted my tax return twice, what then? This isn’t a great example, because the operation is ‘mostly’ a ‘write’ operation. I’ll give a better example below.

    > The premise behind services is a loose coupling and distrust between the participating services. <

    Maybe. Once again, so is anything else. My functions are ‘supposed’ to be loosely coupled and not ‘trust’ their input, and so on..

    > In my opinion, this means that a read/write semantic against backend data is completely unacceptable. <

    Well then, how do you propose I update the telephone number for my client Acme Inc? My ‘CRM service’ controls all access to data to ensure integrity, it doesn’t support pessimistic concurrency, and it does support multiple clients (aka consumers or users).

    I have multiple users. I therefore implicitly have more than one execution context. It’s still the same old problem, I could pass messages around like this:

    Bob to Boss: Hi, I’m from Acme Inc. We have a new phone number.

    Boss to Bob: Sure Bob, what is it?

    Bob to Boss: 555-1234

    Boss to Bob: No problems Bob, I’ll get someone to update this right away.

    Boss to Jane: Acme Inc’s new phone number is 555-1234, please update our records.

    Jane to Service: REQUEST "Acme Inc"

    Sally to Joe: Hi, I’m from Acme Inc. There is a problem with some field on my latest report.

    Service to Jane: REPLY "Key: Acme Inc, Phone: 555-2212, SomeField: ABC"

    Joe to Service: REQUEST "Acme Inc"

    Service to Joe: REPLY "Key: Acme Inc, Phone: 555-2212, SomeField: ABC"

    Joe to Sally: Is some field ABC?

    Jane to Service: UPDATE "Key: Acme Inc, Phone: 555-1234, SomeField: ABC"

    Sally to Joe: No. It’s supposed to be XYZ.

    Jane to Boss: No problems Boss. I’ve updated the CRM Service.

    Joe to Service: UPDATE "Key: Acme Inc, Phone: 555-2212, SomeField: XYZ"

    Joe to Sally: Thanks Sally, I’ve updated your details in our CRM Service.

    Boss to James: Call Acme Inc and tell them they owe us $1,000,000.

    James to Service: REQUEST "Acme Inc"

    Service to James: REPLY "Key: Acme Inc, Phone: 555-2212, SomeField: XYZ"

    James to 555-2212: Hi, you owe us $1,000,000

    555-2212 to James: You must have the wrong number.

    James to Boss: I rang Acme Inc but apparently we have the wrong number.

    Boss to James: Hmm, Jane told me she updated it this morning.

    James to Boss: Yeah, well we all know that Jane is stupid and doesn’t know how to use the computer.

    Boss to Jane: You told me you updated the system this morning.

    Jane to Boss: I did. I entered the new phone number.

    Boss to Service: REQUEST "Acme Inc"

    Service to Boss: REPLY "Key: Acme Inc, Phone: 555-2212, SomeField: XYZ"

    Boss to Jane: Well the old phone number is still in the system. You must have done something wrong.

    James to Helen: Yeah, Jane is hopeless. She always gets things wrong.

    Jane to Boss: I did it. I promise.

    Boss to Jane: We’ll have a meeting about this later. Come and see me at 3pm.

    Helen to Boss: A lot of people have told me that Jane is stupid.

    Boss to Jane: This is the last time. You’re fired.

    Or something equally tragic. The point is, a service needs to share data. In this example, each message was handled transactionally, was atomic, met all business rules, etc. we still violated our data integrity however. You can’t have a service like this without optimistic concurrency. You can’t protect your data from your clients when you have to share your data with them!

    > Again, [SOA is] about the semantics available across distrusting boundaries, not the ability to use optimistic concurrency control. <

    Once again, that’s too simple. I have to establish trust. I have to authenticate and authorize. A service that never trusts anything will never do anything. An interface at any level of abstraction (or in any sense of the word) is a boundary.

    My complaint is simply that nothing here is new. Messaging is not new. Transactions are not new. Data integrity is not new. Establishing trust is not new. Exposing an interface is not new. Defining a contract as part of the interface is not new. Hiding implementation behind an interface is not new.

    (When I say interface btw I mean it in the most general sense, generally I consider ‘interface’ and ‘contract’ inextricably tied to each other and inseparable, and I wish the rest of the world did too)

    > If this were about optimistic concurrency control, we would be pursuing a reincarnation of the same behaviour. In that case, you would be correct that it is simply inventing a new vocabulary. It is not about the same behaviour, though. <

    Once again, if you have more than one execution context you need a mechanism for managing concurrency. There are two ways, optimistic and pessimistic. I suspect you are so caught up in new words that you are forgetting that the concepts are not new.

    > SOA is about interacting with a business-function semantic. It is also about the assumption that when you do your business function, it is only connected via messaging. This leads us to a style of interaction that is reminiscent of the way we interact with businesses. I may place a hotel reservation (and, perhaps, later on cancel that reservation). I don’t fiddle with the hotel’s backend database records. <

    Nothing here is new. By the way, you *are* actually fiddling with the hotel’s backend database records. I’m pretty sure that if I book a reservation at your hotel then your backend database records have changed. I reckon if I cancel they change too. Having a ‘front-end’ to a database isn’t anything new. Nor is having a ‘middle-tier’ that wraps access to it. I reckon if I cancel and my wife rings up the hotel at the same time to make sure our room is a smoking room (because I didn’t tell her I was cancelling yet, you know, waiting for the right time) then I have a race condition. If my wife wins the race condition then I’ll end up cancelling a reservation for a smoking room. If I win the race condition then my wife will be told that the reservation has been cancelled. That’s optimistic concurrency, each of us had the ‘booking number’ and ‘what we thought we knew about the state of the booking’, my wife probably wouldn’t have asked to change the room for a booking she knew was cancelled.

    > This is why there’s a lot of excitement about SOA. <

    I don’t know why people are suddenly excited about old ideas.

    > While it has been done before we came up with a new name <

    Yep. New name. Marketing. Buzz-word.

    > it has not been worked on with the same intensity and with the same hope for broad impact. What you posit for interaction (with optimistic concurrency control over direct access to the partner’s data) is definitely not SOA. <

    I’m not sure what you mean by ‘direct access’, but obviously your service provides data to clients. The service is the client’s interface to that data, and it can pass messages to get it and alter it. The messages are queued and validated. Again, SOA is just a buzz-word that offers nothing new.

    > The interaction is not about record reading and is not about ACID transactions that span the services ("bar"s). <

    The only way that services can communicate is with ‘messages’ that contain ‘data’. Data is a read-lock. If I have data I implicitly involved in a distributed transaction with the service. So, you do have ACID transactions that span the services. In fact, all your service does is guarantee the ACID principles: Atomicity, Consistency, Isolation, Durability. It does it with distributed transactions that rely on optimistic concurrency. Yes, a single message needs to contain everything needed for an atomic transaction on the server. Anything less granular will be treated as a sequence of acceptable ‘transaction states’ and is fundamentally ‘what your application does’ or ‘what your business logic is’, such state transitions are necessarily not atomic.

    > In fact, I am trying to point out that SOA is about looseness between the services. <

    Loose-Coupling. Not new.

    > The behaviour of a collection of services should be identical even if one of them goes up and down intermittently (of course with the exception that the responsiveness of the collections of services is impacted). <

    Atomicity, Consistency, Isolation, Durability. Not new.

    > The use of queues for the messages that connect the services allows for a great deal of tolerance of intermittent availability. <

    Queuing, Messaging. Now new.

    > Amazon.Com (and most scalable web sites) have a scale-out front end and a centralized back end. Browsing and shopping happen on the front end. When you push SUBMIT, a message is enqueued for delivery to the back end system. <

    Queuing, Messaging, Amazon. Not new.

    > Normally, you get an email from the back end system a few seconds later. Sometimes the back end system is down for a while and you get the email in an hour or so. You still get your books.

    Atomicity, Consistency, Isolation, Durability, Queuing. Not new.

    > So, one of the ideas is to tolerate the fact that these systems are in different time domains as much as possible. That is the opposite of believing it is at the speed of light. <

    The speed doesn’t matter, I made my point that if you have more than one execution context you have to deal with race conditions. You can race fast or slow, you’re still racing. These problems are as old as time. The reason I mentioned the speed of light was because I was talking about how the Sun sends us data. It streams state at us ‘as fast as it can’. A paradigm where changes to data you had a read-lock on in a distributed system where a client was notified by a server ‘as fast as possible’ would be kind of new. Sure, the idea of ‘pub/sub’ is not new, but an overbearing architecture that did it’s best to save clients from wasting their time working on stale data (like something that sent Joe a message telling him that Jane had just modified data he still had loaded in his client application) would certainly be new. As it stands all SOA is proposing is that we stop using pessimistic locks and go through one interface to work with data. Yes it is a good idea. No it is not a new idea. My gripe with this particular buzz-word is that it only seems to be showing me that people don’t already know what I had assumed they already knew, for example: why are we having this conversation? Surely we know about the virtues of the ACID principles, how to use queues to get data into a specific execution context, how to define an interface that exports schema, contract and policy. Lets use and example of an ‘interface’ in one programming sense of the word:

    public interface IAmSomething {

    Int32 PrimaryKey { get; }

    Int32 SomeRelatedKey { get; set; }

    void DoSomething();

    }

    Schema is stuff like the fact that I have a member called ‘PrimaryKey’ that is read-only whose type is a 32-bit signed integer.

    Contract is stuff like I must set SomeRelatedKey to a positive 32-bit integer that identifies a related entity prior to calling DoSomething.

    Policy in this case is simply that this is a public interface.

    So I define it in XML. I version it differently. Doesn’t matter, it’s still all the same concepts, and unless you are going to start talking about an ‘implementation of what you like to call SOA’ then you’re not telling me anything that I don’t already know (unless you are, in which case I don’t understand yet). I know about messaging, queuing, concurrency, ACID principles, etc.

    > Oh, would that this were a function, a class, or (dare I say it) a component. <

    So you agree then, that these principles that you are trying to tout as ‘SOA’ should apply equally to more discreet elements of a system’s design..?

    > When was the last time you saw a system in which all of the components minded their own business and didn’t fiddle with the same data that other components fiddled with? <

    Is that rhetorical? This is how I always strive to design elements in my system. If I have a system that supports more than a single execution context I always handle concurrency in the typical ways, depending on what suits me best. I like the messaging and queuing paradigm, and have been using it (along with optimistic concurrency) for years, in one form or another. I haven’t ever called it SOA before, but the expression is the only thing that is really new.

    If you agree that these are same ideas that are supposed to be in functions, classes and ‘components’ and so on, and your observation is that they are not, then why is it that you think you’ll be able to get everyone to achieve these things suddenly just because you make the layer of abstraction larger? For my money, I’d prefer if Microsoft (and the software development community in general) tried to get this grass roots stuff right first rather than picking a buzz-word to carry them for a few months.

    > When I look at enterprise applications that manage data stored in the database, I see lots of different components fiddling with the same data. <

    Is that what you see? Data in a database like SQL Server for example? I’ll tell you right now that if your data is in SQL Server the only way to change that data is to go via SQL Server (short of getting out your hex editor and fooling with the MDF file). So there you have it, a single point of access (aka interface) to your data. Take away the DBMSs ability to issue a pessimistic lock to a client, export all your access with stored procedures and there you have what you are trying to call ‘SOA’. Not new. You might be talking about new mechanisms for doing the same thing, but unless you admit that you are trying to find new mechanisms for doing the same thing then I don’t like your chances of succeeding (re-inventing the wheel isn’t easy). It’s not like ‘SOA’ is anything specific. If you said SOAP, or SQL for example, and how you might make it better or get it to do more, or ways to restrict it so it did less, or were speaking about some concrete technology or protocol then maybe we’d be talking about something new, but your just talking about abstract high-level message queuing paradigms that have existed for decades. So your messages are going to be in some XML based format, and your going to have a queue. Big deal. Is this necessarily better than them being in SQL and letting a DBMS queue them? Probably. It’s the same principle though. If you’re bumping messages from one DB to the next, to address scaling or availability concerns this isn’t really magical either.

    > Furthermore, you have to look pretty far to actually be able to group a collection of database tables and a collection of code into a chunk where no transaction spans this chunk and that chunk. <

    Yep. So you end up writing an ‘application server’ or ‘middle-tier’ that manages a whole heap of tables that you put in the same ‘database’. Not new.

    > Some of this may be a community perspective. Language and object folks don’t think about the disjointedness of the database data that must exist to have true encapsulation. They think about member variables and avoiding the use of globals (which is a fine thing). If, however, this component and that component fiddle with the same database records, what kind of encapsulation does your component have? <

    The same encapsulation it has always had: the ‘interface’ that it exports and the ‘contract’ that it defines. You think that writing two components that interact with a ‘service’ is different from writing two components that interact with a ‘DBMS’? A DBMS is a service. All you’ve done is decide to stop using pessimistic locking and call it something else. Does a DBMS not support the ideas of ‘schema’, ‘contract’ and ‘policy’? Is it not completely isolated except via incoming messages? You think a new layer of abstraction is going to help you? I totally agree that a middle-tier that manages (and limits) access to data is important and useful, I don’t agree that we need a new name for it.

    > [John] Schema, contract and policy also seem to me to have been around for a very long time, at many different levels. Didn’t the word for this used to be ‘type’?

    [Pat] Not an unreasonable perspective but missing a few things. As we are trying to hook together our services, we do not want to be as chatty as we have been with component-based systems. It becomes important to package up a big request and ship it off to the services with the possibility of as much looseness and independence in the definition as possible. I totally agree that schema is very much like types and, indeed, very much like interfaces. Interfaces are almost always finer-grained and chattier. Interfaces did not include notions comparable to contract or policy, though. The interface did not say the allowable order of method calls. Similarly, interfaces and types did not attempt to address the domain that policy is targeting. <

    From the dictionary [1]: Interface – the place at which independent and often unrelated systems meet and act on or communicate with each other.

    Says nothing about how ‘chatty’ they are. Encompasses everything, including schema, contract, policy, etc. Always has. Not new.

    > I think there is a difference [between SOA and functions, classes, etc.]. I will totally grant you that there is ambiguity about the crisp delineation between what a service is and what a component is amongst the SOA crowd. We are debating these issues and trying to form a consensus just as the object folks went through some churn during their formative times. <

    It’s a DBMS without external pessimistic locking. Not new.

    > It is my opinion that there is a substantive difference and, hence, this is more than simply rebranding. <

    Well, I think that it’s probably just a mix of marketing, love for buzz-words and a vain attempt at re-inventing the wheel. You’ll end up rewording what Codd wrote 30 years ago.

    > I would be pleased to have your commentary and to do my best to address anything you have to say in this forum. I am pleased as can be about your firmness in expressing you opinions and concerns! Thanks, again, for your vociferous comments! <

    You’re welcome. I’m enjoying this conversation too. Debate is the only way to make any progress. Sorry if I’m too short in places, I’m tired, busy, etc. The thing I’m mostly vexed about is that I struggle all the time with how to do these things better and the last thing I want to see is a bunch of sheep chanting ‘SOA’, ‘SOA’, and refusing to acknowledge that it doesn’t bring anything particularly new to the table. It doesn’t promise anything that can’t, isn’t or hasn’t already been done, a new data format or protocol might, but I can’t see the a pressing need for one.

    If new words help you understand old concepts, then go for it, but I’d rather keep the language that has evolved over the years and that people already speak. Why can’t we just keep calling it ‘n-tier’ like we used to? It’s still even really just client-server between tiers, at the end of the day we are trying to maintain data integrity while servicing multiple execution contexts, our DBMS tends to be ‘multi-purpose’ and ‘flexible’ which seems good at first but bad eventually, so we wrap it with a ‘service’, also known as application-server, or middle-tier.

    John.

    [1] http://www.m-w.com/cgi-bin/dictionary?book=Dictionary&va=interface

  3. John says:

    Er, spotted a typo.

    I said:

    The thing about a ‘pessimistic lock’ is that it is ‘implicit’ and you have it as soon as you have a copy of some data.

    I meant:

    The thing about an ‘optomisitic lock’ is that it is ‘implicit’ and you have it as soon as you have a copy of some data.

    I’m sure there are others, but that one was sort of important.

    John.

  4. Ramkumar says:

    John,

    Your whole dissertation can be summarized in one statement ‘This is not new’. Nobody claims that this is new.

    And of course, your point of everything should reside in one database is just not practical. If you look at enterprise IT, all you see is bunch of systems automating certain business functions. Some of them are off-the-shelf software like SAP, some of them are homegrown, some of them are acquired through acquisition etc. SOA gives you a framework to deal with this heterogenous environment. I can bet that you cannot explain how to get work done in this heterogenous environment with your ‘everything is database’ opinion.

    SOA’s basic premises are

    1) Services are autonomous. Think about ERP and CRM systems and think about how your new application is going to work with these applications that YOU DON’T control.

    2) Heterogenity is reality. You can’t assume that everyone is going to use the same RDBMS store. I know that it is ideal to have one big momma database that contains everything, but it is not a realistic scenario.

    3) Services evolve independently of each other. One CRM system may be replaced with another. How does your app work in this kind of environment?

    Of course, you don’t really have to struggle hard to explain that some state or data will change while fulfilling business request. That is tautology. After all, IT is about information.

  5. John Cavnar-Johnson says:

    This entire conversation reminds me of Alice in Wonderland where the denizens of that lovely land spin up wonderfully bizarre edifices of logic founded on very simple misunderstandings of reality. In this case, Pat and John share a common delusion that the relational database is reality ("God’s Truth" in John’s formulation). This idea is just silly and John’s phone number scenario is a perfect illustration of how pernicious it really is. At the beginning of the scenario, the database contains incorrect information about Acme’s phone number and SomeField. Obviously, the data in the database is reality seen as through a glass, darkly. The incorrect data is not the fundamental problem though. The application users try to correct the data, but they are defeated by the rather stupid application logic. When Joe updates SomeField, the application also updates the phone number, obliterating Jane’s change. From Joe’s perspective, this is idiotic. He wasn’t interested in whether or not the phone number had changed since he asked to see THE VALUE OF SOMEFIELD. He didn’t ask the application to change the phone number. The application decided that since Joe updated SomeField, he must update the phone number. I’ve developed applications where this sort of behavior would be literally criminal.

    The broader question is why do applications operate this way? It’s because we (application architects) forget a very basic truth: Our relational data structures are a very limited projection of a complex and multi-dimensional reality into the two dimensional relational world. The relational model is defined by its mathematical correctness, thus ensuring that it will never adequately reflect human reality. Enforcing the rules of the relational model MAY be necessary to protect your application’s view of reality, but it is never sufficient. In the relational model, columns of a table are merely co-equal attributes of a relation, but this is never true in a real application. Phone and SomeField aren’t just attributes of a relation. They signify something to the users of the application, i.e. they are representative of meaning in the real world. We’ve gotten so wrapped up in ACID, concurrency, locking, etc. that we’ve forgotten this fundamental fact.

    John views the world through his relational prism and says SOA is nothing new, just an RDBMS without pessimistic concurrency. From that perspective, he’s absolutely right. I say, with Hamlet, "there are more things in heaven and earth than dreamed of in your philosophy." That relational prism filters out everything that’s important about SOA. SOA is not about enforcing the relational model, it’s about developing systems that more accurately reflect the true nature of the business process.

    Pat recognizes that SOA requires compromises to the relational model, but he still clings to the notion that the database is reality. He just thinks our services are a wee bit out of touch. I think SOA is like the night sky, but for reasons very different from Pat. SOA is when we recognize that the night sky is not the inverted sphere we thought it was, but a whole complex universe. If we’re just interested in the relative appearance of the stars’ locations, then the relational model still works. If we want to understand how our place in the cosmos, we need a better way of interpreting reality.

    Before someone jumps in and says that business logic, implemented through a rich object model solves all of these problems, you will need to address the problems with that world-view that Pat has alluded to. He is spot on in his analysis of the fundamental incoherence of OO in the enterprise application world.

    As a final note, if anyone sees fit to reply to this, I would suggest tagging my comments with JohnCJ to differentiate them from the other participant named John.

  6. John says:

    Just a few more comments..

    So far, it has been said that ‘SOA’ is new, and that ‘SOA’ is not new. I don’t think that the concepts are new, it’s is what we have been doing for a long time.. my gripe is that I keep hearing arguments or discussion about ‘what it is’ and it’s not anything new, so why don’t we just keep talking about stuff we already know.. it’s still N-Tier development in my view, and in many ways is client-server, since we should really care what is ‘behind’ our services, we only care about their interface.

    I like some of the stuff that Don Box said:

    "If I share an abstraction it has a cost. And the fundamental premise of service orientation is that we try to control sharing of abstractions."

    But we’ve always known this right? Since I can remember I’ve been taught and agreed that you should limit access to your abstractions, it is how you accomplish loose-coupling, and we know that loose-coupling is a ‘good thing’.

    and:

    "Fundamentally there is a broader thing here which is this move towards service-orientation, or called service oriented architecture by, you know, people who want to charge more money for consulting."

    So, I’m not the only one that thinks SOA is just a buzz-word.

    Don has a big thing about ‘type’ and ‘schema’ that I don’t really share. I understand where he gets it from though. I still think it’s OK to say that ‘type’ is ‘schema’, ‘contract’ and ‘policy’. The problem is really only a ‘version’ problem. If I change my class’s implementation I change its version making it incompatible with the previous versions. We need finer grained version, so the ‘schema’ and ‘contract’ and ‘policy’ can change or stay the same across binary versions of my classes that support them. He forces the point because in earlier versions of SOAP the type schema was heavily versioned. This is not a new problem, it’s surprising the extent that it was built into SOAP given the history of this serialization/version problem.

    In the .NET world the SOAP formatter will encode your ‘schema’ with a namespace comprised of your ‘class and version’. Meaning if you change a class’s version you effectively change the schema, even though the semantics have not changed. This is very, very painful. To use it effectively you really need to create an entire library simply to describe your schema so it can be versioned independently, and there aren’t great tools for this. Generally the types change because of code upgrades that fix bugs to come in-line with the expected ‘contract’ of a type, or to improve performance, etc. There might be a ‘consumer’ type and a ‘server’ type that share the same data but work with it differently (still honouring the implied ‘contract’ and enforcing any ‘policy’), etc. I agree that encoding your data with your type and version is madness, the schema changes far less often. That’s really why the point is so heavily made about ‘schema’. A DBMS is a good example of this because I can change the schema and not break everything that relies on unaltered parts of it, regardless of their ‘version’.

    I agree with Pat, most ‘services’ will use a relational backend. They don’t have to, but they typically will (for me anyway). A document management, or source control service may not for example.

    JohnCJ is right about the ‘goal’ of focusing on business process. Don is right the best method is by *limiting* interfaces.

    One of my complaints is simply that these have always really been our goals. My other complaint is that of ‘transactions’ not being exposed by services. In the first instance I say that if you have data you have a read-lock (optimistic concurrency), meaning that anyone who can get your data can be involved in a transaction with you, that could span a great deal of time, and a great deal of messages. There are some ‘services’ which may simply *have to* issue pessimistic locks, for example a source control system during a batch merge where the user has to OK or fix several merge operations during a check-in.

    The problem with pessimistic locks is what happens if the client is slow or ‘disappears’, some clients are more reliable than others however, I’m not sure that optimistic concurrency is right in every situation (although personally I’d try to use it all the time), I don’t know that if you ‘lock’ at some point that you aren’t necessarily a service, Visual Source Safe as a (terrible) case in point.

    I don’t want to give anyone the impression that I don’t think service oriented architecture is a good idea, I just wanted to make the point that it isn’t new, and that most of the concepts apply equally to the more fine-grained aspects of your system.

    My point about the DBMS wasn’t really supposed to have anything to do with its ‘relational’ aspects. Yes, I think these are important, but I was really using it as an example of a service, it meets all the requirements of a service, its problem is that most implementations allow for too much of some things (mostly too flexible) and not enough of other things (security, complex validation, etc.), meaning that a wrapper is typically useful. The wrapper is obviously your service.. I stand by my point that you could create a ‘service’ simply using SQL Server and stored procedures. You can define schema, contract and policy and expose a limited set of functionality focused on business process. The problem is that generally this would be both inefficient and difficult to implement compared to using imperative code in a front end service, also exposing your service yourself means you can use other more flexible or ubiquitous protocols (i.e. not stuck with ODBC for example).

    As for "God’s Truth", it’s in quotes because it is facetious in many ways. Obviously data in a system is only as correct as users make it (hopefully not any less correct however!). The point is that it is the shared ‘authoritative’ source for the data. If someone knows something that isn’t in there, then they should put it in there.. even this isn’t new: ‘garbage in, garbage out’..

    Lastly, I think the ‘night sky’ is a poor analogy for a whole heap of reasons, stars don’t really ‘send requests’ (although gravity is interesting, did you know that there is a concept of a ‘gravity wave’? I used to have a theory that perhaps long-distance instantaneous communication might be possible with gravity, but it turns out that gravity ‘information’ doesn’t travel faster than the speed of light, it’s an inverse-square law, so distance massively diminishes its effects, however, we are actually being attracted to where a star ‘was’, not where it ‘is’, although I guess the attraction is pretty much negligible, I think Stephen Hawking’s has a theory that if something is far enough away from you in space-time it can never effect you, which is interesting (but not really relevant to stars we can see)), because the concepts of messaging are really crucial, and because the stars just ‘stream state’ they are really ‘read-only’ services, which doesn’t flag concurrency as an issue (and it definitely is).

    People in a team talking to each other might be though, a person can ask a question or make a statement, they might tell one or more people the same thing, each person will apply a set of rules to decide if they believe something or not, decide what they will do based on messages or requests, and generally just be discreet systems talking to other systems using some common messaging system (i.e. language) trying to get along and accomplish something.. Information will change and propagate in different ways, some people will work with old or invalid data until they find out their data is bad. If someone tells someone something containing old information they’ll probably be corrected, etc. The thing about humans is that they tend to process synchronously, they aren’t generally telling someone something they used to know at the same time they are finding out that it’s not true anymore (and sometimes they just lie, or stay silent, or talk to the wrong person, or are wrong, etc. basically they contain *bugs*, heh, it’s probably a feature).

    John.

  7. Marlon Smith says:

    Thanks Pat, I really enjoy reading your thoughts on SOA! Keep up the good work, any books in the making?

  8. Dino says:

    John is oversimplifying just a bit, from my perspective.

    > Schema is stuff like the fact that I have a member called ‘PrimaryKey’ that is read-only whose type is a 32-bit signed integer.

    Contract is stuff like I must set SomeRelatedKey to a positive 32-bit integer that identifies a related entity prior to calling DoSomething.

    Policy in this case is simply that this is a public interface.

    I think for SOA there are deeper meanings to Contract and Policy that you have summarized here. The restriction you proposed on the SomeRelatedKey, for example, sounds like schema to me. The Contract as I understood it defines the sequence and ordering of calls into the service. Like, when you go to a restaurant, you can’t pay the bill til you order your dinner. And the Policy is separately enforced, and a bit more involved that just data restrictions.

  9. John Cavnar-Johnson says:

    The irony of this discussion is that I have the distinct feeling that Pat, John, and I build the same style of application. We agree much more than we disagree. And yet, I still see a huge problem in hanging on to our old way of thinking about applications.

    John is absolutely right when he says that SOA doesn’t change any of the fundamental principles of distributed application design. I suspect that Don and Pat would argue that the value of SOA is that it models those principles much more directly than client-server or distributed object technology ever did. Because they are in the business of building tools for the rest of us geeks, they are trying to achieve "The Pit of Success". [1] This is a good thing and I think it explains John’s reaction . He’s saying, "Hey, this stuff just reflects what I’ve been doing all along" and he’s right. SOA (conceived at that level) doesn’t have much to offer the seasoned, battle-scarred, veteran distributed application warrior. It just makes it a little easier for others to match our success.

    The transition to SOA can be something much more profound (Warning! When geeks use words like profound you should be very skeptical). It could serve as a new mental model for designing applications. There are two dominant mental models for application architecture. One is the OO approach and the other is the relational approach. In the OO community, the object model is all that matters. The relational database is just a necessary evil, a convenient persistence mechanism until we reach the OO nirvana of direct object storage. In the relational view, the ER diagram is the thing. Code is just way of keeping all that mathematically impure "business illogic" out of our schema.

    In practice, of course, the dominant form of application architecture is "The Big Ball of Mud". [2] We run around at geek get-togethers wringing our hands about this and blaming the victims, but there is a very good reason this is true. Those dominant mental models don’t work very well for developing most real-world applications. Now, if you’re building simulation software or GUI frameworks, OO rocks. If you’re building a reporting application, the relational view is essential. Move beyond those and you quickly run up against the disconnect between your mental model and the way any organization larger than about 12-15 people works.

    As organizations grow, we create subgroups that focus on a subset of the overall goals (in business this may be sales, accounting, customer service, etc.). Then, we restrict the communication between groups by limiting the pathways and the content. For example, we tell the sales team that they can’t just call up the folks in inventory and start asking them random questions (Hey, Fred, how many widgets do we have in stock? Acme wants buy 200 or 300 hundred. And they’ll need some gizmos to go with those widgets too, how many of those do we have?). Instead we develop stylized ways of interacting. You can submit Form 34X78D, Current Inventory Request Form to the Inventory Information Desk to get that information or you can submit a purchase order (Form 34Y52A) to the Order Processing Desk and those widgets and gizmos will be allocated for ACME.

    What’s important here is that it doesn’t matter whether Fred is a scruffy looking dude who walks out to the warehouse and counts the widgets and gizmos and uses a magic marker to write ACME on the right number of boxes OR Fred is a fully object-oriented J2EE middleware app talking to an IBM mainframe halfway around the world. The SOA conceptual model works either way because its key elements map directly back to the reality of bureaucracy.

    If we look at SOA from this perspective, the nature of those key elements is a lot clearer and the limitations of current technical approaches are highlighted. It’s not enough to have orchestrations, messages, contract, schema, and policy. We need to be able to define business documents that exist separately from the services that exchange them (something that is hard to do today with VS.NET and WSDL). These business documents need to have a conceptual identity separate from the schema that services apply (the schema for a purchase order being submitted for fulfillment is different than the schema for that same PO when it reaches billing). We shouldn’t be burying our knowledge of the actions that a service performs in the name of a web method call or SOAP header. We need to recognize that the authoritative record of the business data has to be in these documents, not in the relational store. The data type of a business element is a minor facet of its meaning, not the very essence of its existence.

    I don’t want to give up all the technical goodness we get from relational databases and modern programming techniques, but I do want to push them down to the implementation level so that they don’t drive application design. SOA gives us a chance to do that, but it won’t happen unless we break free from our outmoded mental models.

    [1] Rico Mariani, quoted by Brad Abrams

    http://blogs.msdn.com/brada/archive/2003/10/02/50420.aspx

    [2] http://laputan.org/mud/mud.html

  10. Stephane Rodriguez says:

    Since a lot of people seem to agree that SOA is nothing new, I might as well try ask a few things :

    – being from a technical COM background, what’s wrong with QueryInterface when it comes to contract tie rather than implementation tie? While I understand the need for an online directory or services, I don’t see how the overhaul that SOA seems to implify fit into the little things that are needed to make COM more enjoyable in a connected world. Any light bulb out there?

    – from the application logic perspective, the little I understand about SOA the more I draw the line with the stateful/stateless terminology of server-side components (especially my pet, the BI back ends). I mean, a lot of the things that I read about SOA tend to make me think that they’ll happen only with stateless components. But how is this effective in the real world? Aren’t components inherently not stateless whenever they do something useful based on your profile, back-end third-party, … Again, any light bulb out there?

    Thanks.

  11. John says:

    Executive Summary: I started ranting, but basically I’m talking about the need for optimistic concurrency and transaction support and asking about how results of requests are returned from services in the SOA world.

    > As organizations grow, we create subgroups that focus on a subset of the overall goals (in business this may be sales, accounting, customer service, etc.). Then, we restrict the communication between groups by limiting the pathways and the content. For example, we tell the sales team that they can’t just call up the folks in inventory and start asking them random questions (Hey, Fred, how many widgets do we have in stock? Acme wants buy 200 or 300 hundred. And they’ll need some gizmos to go with those widgets too, how many of those do we have?). Instead we develop stylised ways of interacting. You can submit Form 34X78D, Current Inventory Request Form to the Inventory Information Desk to get that information or you can submit a purchase order (Form 34Y52A) to the Order Processing Desk and those widgets and gizmos will be allocated for ACME. <

    This is some interesting stuff. If we all agree on at least one thing it is the need for ‘limiting’ communication and the need for ‘protocol’ (notice that I’m saying nothing about how this isn’t new here ;). Here’s some more thoughts:

    Lets call our business MyOrg, the customer Acme, the sales guy Sam, the inventory guy Fred and the customer representative Michelle (from Acme).

    The first thing that I want to point out is that there are still two very *real* modes of communication. Synchronous and asynchronous. When people use the term ‘messaging’ do they mean to imply async processing of messages? Saying that a message is ‘queued’ tends to imply this, unless the term ‘queue’ is used to imply ‘block until lock is acquired’, they achieve the same effect but are suitable for different types of execution, and are still different. I don’t really consider a message to be anything more than ‘data’ (or ‘information’ if you prefer, or maybe even ‘schema’ if you want to use our new found vocabulary), the word itself implies nothing about how the message is formatted, communicated or processed. So ‘messaging’ is simply transferring data, often too it triggers execution, although not necessarily, execution can continue in the same execution context as the notifier (i.e. a local method call, .NET remoting call (in an abstract sense), etc.) or in another execution context (i.e. on a worker thread, message ‘picked up’ from another process (local or remote) from a data store (RDBMS, file system, message queue, etc.), etc.).

    In a computing sense each of these modes of execution can emulate the other. For example, you can use polling, busy-loops, locking, signaling, etc. to force async execution contexts to behave in sync (like an RDBMS). On the other hand, you can time-slice, context switch, etc. to get one synchronous execution context to emulate asynchronous execution (like a CPU).

    Speaking for myself, as a human ([insert vampire joke here]), I can ‘manage’ several things at a time, but I can only ‘think’ about one thing at a time. In many ways I am synchronous, although I have many subsystems and can time-slice (i.e. change what I’m paying attention to) so in many ways I’m also asynchronous. If you think of a call-centre, calls are queued, because an operator can only handle one at a time. Not for lack of trying I can only speak one word at a time, or press one key at a time to express myself (i.e. expose my state), and I can really only read one thing at a time, or listen to one thing at a time (i.e. receive new state, including requests for my state).

    With this perspective, think of the scenario you discussed again. Each of the paper based (or ‘form’ based) methods that fell out the the bureaucratic evolution of the business process for MyOrg imply asynchronous processing. The thing about the ‘Current Inventory Request Form’ is that when Sam fills it out he will provide ‘reply to’ information so that the response can be dispatched to him. In the ‘real world’ this might be via courier, or e-mail, or fax. We are talking about computer based architectures to facilitate these types of processes, and we’re talking about SOA.

    How is that Sam is to receive the results of an inventory query in the SOA paradigm? Say Sam has some software that runs on his computer (rich client) that provides a user interface for preparing this ‘document’ (or query, message, transaction, whatever), and MyOrg have a ‘service’ that will receive this document as a request. Will the service *always* queue before processing and return to the client only ‘your request has been queued successfully’? That’s what happens when Sam takes his form to the ‘Inventory Information Desk’ (or places an order at Amazon). The clerk at the desk snidely remarks that he’s interrupting her lunch and glares at him as he pops it on her in-tray. Sam then goes away knowing that his ‘request has been successfully queued’, but he has no idea how long until he will get the results, although the business process has defined a mechanism for how he will eventually get them (i.e. Sam left return address information on the request).

    If we are going to emulate this, then the rich client interface must also set up a ‘service’ of it’s own to receive results and then have a facility for dispatching them into the user interface, this can be non-trivial (although I do this, and I know of two main ways to ‘do it’, those being to spin a worker thread in the client that handles a blocking request to the service, or to set up a listener on a TCP socket to receive call-backs from the service, neither approach is without its own difficulties and each can be forced to emulate ‘synchronous’ processing, the call-back method is super impractical in the *real* world because of those *annoying* firewalls).

    On the other hand, I could place a blocking call from the thread that runs the windows UI message pump to the server (oh, the shame!), or run a command line utility that did the same, each of these could process the ‘transaction’ (message, request, etc.) synchronously from the client, blocking until the server responded with the results, in reality there would be scores of places where sync/async processing was emulated until the ‘top level’ abstraction of a blocking call to the server that returned results. This is more like Sam having a conversation on the phone with Fred than the form based method from the point of view of synchronous behaviour. Does SOA require emulation of this type of messaging on the client, or can a request block at the server while results are prepared (as with a HTTP request for example)? If SOA only queues requests then what is the overbearing architecture for dispatching results? The *real* world really still has the ‘client-server’ paradigm very much embedded, and generally the only ‘realistic’ form of communication is client initiated for any non-trivial system (most systems require more than an e-mail saying ‘your order has been received’, even if it is enough for Amazon). Are clients going to poll the server for updates (POP3 style)? Are clients going to hold their socket open until the server de-queues and processes the request (HTTP style)? Does SOA not offer a strategy for dealing with any of this, thereby offering me nothing (new or not)?

    Another interesting thing about the ‘bureaucratic’ approach was that of ‘protocol’. The forms-based approach helps to push the responsibility for reasonable communication back to the client. If I’m on the phone to Fred and I’m initiating a conversation, I start knowing that the protocol is ‘English’, and that’s about it (save anything I may have learned from experience). The participants can then be ‘inefficient’ by chatting about things that aren’t focused on business (like Sam could say "Hi Fred, how are you today?", which is obviously a *complete* was of time, especially when Fred’s says "I’m awesome mate. Did you get home OK after last Friday? What happened with that hotty that you met at the bar? Any action?", and so on, obviously a dismissable offense by now). By exposing the ‘schema’ to clients you can get ‘some’ of the business validation done before the request is placed, this is more true when you’re asked to fill in a form than when you’re asked to ring someone, although there are plenty of examples of efficient voice protocols, the armed-forces have been doing well with these in ‘real-time’ situations for a long time now over radios for example.

    Even in many *real* organisations like the one in the example it might be necessary for Sam to get on the phone to Fred to find out how many widgets are in stock. This happens because of the failure of the Inventory Information Desk to provide information in a synchronous manner while the Sam actually has Michelle on the line who needs to know right now. A HTTP GET operation is more like what he wants, rather than reconfiguring his e-mail client to poll the POP3 server more frequently.

    The results of the inventory request are a perfect example of optimistic concurrency too. During the time that you receive the results of your request and act on them the inventory can change, meaning that you might be placing an order that you can’t fill. A part of reality is that ‘the more recent your information the more likely it is to be correct’. This means the results of the ‘current inventory request’ aren’t worth keeping for very long, and probably not at all, so having them arrive via e-mail (or some other mechanism for a client persistent and server dispatched artifact) for example can be the sux. Considering your ‘copy’ of data to be ‘authoritive’ is a very big mistake, but much business validation that I see takes place on copies of data, this is very bad, a ‘pit of failure’ if you like.

    Being able to ask Fred what the inventory levels are is important. In this case Fred is the ‘interface’ that provides the service serving as the ‘authoritive’ source of information, the telephone system provides most of the OSI network layers and English (and some Jargon) is our application layer protocol. Perhaps this process has been heavily shrouded by bureaucracy and the ‘Inventory Information Desk’ is the service, snail-mail and paper forms provide the OSI stack. The problem with the first model is that it’s inefficient, and doesn’t scale well. The problem with the second model is that it’s asynchronous. It may be the case that reporting on inventory (estimates perhaps, certainly non-authoritive once they leave the service’s execution context) is a very quick process, thus the ‘overhead’ of filling out a form, faxing it to inventory, waiting while it sits in someone’s ‘in tray’, then waiting for them to process it and get back to him could be more than Sam is willing to sign up for just to answer Michelle’s query while she’s on the telephone. Sam’d rather place a ‘blocking’ request to the inventory system and wait while his request was serviced immediately, because he needs the information before he can continue ‘working’ (working for Sam is having long-lunches with Michelle from Acme, heh, it’s a tough job, but someone’s got to do it).

    I’m not familiar with how SOA plans to address these two types of transactions. Is it that you always need option two, i.e. the non-blocking calls to the server?

    I’m still pretty concerned about data integrity too. No-one is talking about how to manage optimistic locking while passing around these ‘documents’ that contain massively de-normalised copies of data that is probably ultimately stored in an RDBMS regardless of the object models that it passes through for validation before it gets there. What I’m hearing around the traps (including some stuff on this thread) seems to point to people not being concerned about (or understanding) data integrity. Not caring about data integrity is, as has been pointed out here, "literally criminal" much of the time, i.e. people can go to gaol for that shit. If I’m using services that can’t issue pessimistic locks and I have services that aggregate other services in a ‘heterogeneous’ environment, then how am I to co-ordinate and guarantee atomicity of a distributed transaction?

    If you want the cold hard truth, most people don’t get it. I can make that statement from experience. Most of the SME systems that I’ve seen have failed to cater for transactions when working with ‘copies’ of data, and something like ‘accidentally’ and ‘unknowingly’ blowing away some or part of someone else’s changes while you’re doing an update is a risk that many systems have hard-coded. The only thing that saves them is that ‘it very rarely happens’, or ‘it’s no big deal if it does’ but the system is flawed by design. Microsoft’s introduction of the DataSet in the .NET platform and particularly what people are doing with it verges on criminal too. Obviously my big thing is optimistic concurrency, if something needs to be shouted to the development community in general it isn’t that ‘loose-coupling is important’ it’s ‘optimistic concurrency is crucial!’. Doesn’t everyone know that merge-replication is a nightmare? Necessary perhaps, but not to be taken lightly. Supporting optimistic concurrency with complex documents like the ‘Purchase Order’ or ‘Current Inventory Request’ can be enormously challenging because they rely on potentially stale data when they are created. For example, a purchase order is created, the product is discontinued, the purchase order is processed. Points in time are crucial and mechanisms for determining when ‘peoples view on reality’ is out of sync with ‘the business’s reality’ need to be in place.

    On a heavily related note, how is it that you can update the de-normal/distributed client data in a ‘transaction’ in a service-oriented system where the ERP and CRM systems refuse to issue locks and you have more than one client accessing these services? The thing about an RDBMS is that it will issue the pessimistic locks that your service *needs* while it is processing the stuff on its queue (like the purchase order, or the current inventory report). By definition of a service (i.e. encapsulates its own data and doesn’t issue locks) I can very rarely have one service co-ordinate with other services and still guarantee ACID compliance. If you ignore the ACID principles you will end up with bogus data, is SOA all about "hey, bogus data is a reality, live with it"?

    SOA still sounds to me like a vain attempt to simplify stuff that can’t be simplified. If a service is only ‘schema’, ‘contract’ and ‘policy’ then it can’t do much by itself. Rather than redefining ‘type’ or ‘interface’ to get around bastardisation of the words that have been tied to mistakes in some people’s systems and trying to pretend that we are doing much more than using XML to encode a request that we used to encode in SQL and dispatching it via HTTP where we used to dispatch it via TCP then why don’t we talk about how we are going to handle concurrency and result dispatching.

    I want to talk about *implementation*, because that’s where I’ll really be able to tear shreds off people (yep, I’m pissed off today, a part of me wants to be apologetic).

    Consider the difference between:

    ———————————————

    Order order = new Order("Acme Inc", "Michelle");

    order.Items.Add(new OrderItem("gizmo", 10));

    order.Items.Add(new OrderItem("widget", 42));

    order.Submit();

    ———————————————

    and:

    ———————————————

    POST:

    <PurchaseOrder ForCustomer="Acme Inc" AuthorizedBy="Michelle">

    <Item Product="gizmo" Quantity="10"/>

    <Item Product="widget" Quantity="42"/>

    </PurchaseOrder>

    ———————————————

    and:

    ———————————————

    using (SqlConnection connection = new SqlConnection(connectionString) {

    connection.Open();

    SqlTransaction transaction = connection.BeginTransaction();

    try {

    Int32 orderId;

    using (SqlCommand insertOrder = new SqlCommand("proc_InsertOrder")) {

    insertOrder.Params.Add("@forCustomer", "Acme Inc");

    insertOrder.Params.Add("@authorizedBy", "Michelle");

    SqlParameter orderIdParam = insertOrder.Params.Add("@orderId", SqlDbType.Int32);

    orderIdParam.Direction = ParameterDirection.Output;

    insertOrder.Connection = connection;

    insertOrder.Transaction = transaction;

    insertOrder.ExecuteNonQuery();

    orderId = (Int32)orderIdParam.Value;

    }

    using (SqlCommand insertOrderItem = new SqlCommand("proc_InsertOrderItem")) {

    insertOrderItem.Params.Add("@orderId", orderId);

    insertOrderItem.Params.Add("@product", "gizmo");

    insertOrderItem.Params.Add("@quantity", 10);

    insertOrderItem.Connection = connection;

    insertOrderItem.Transaction = transaction;

    insertOrderItem.ExecuteNonQuery();

    }

    using (SqlCommand insertOrderItem = new SqlCommand("proc_InsertOrderItem")) {

    insertOrderItem.Params.Add("@orderId", orderId);

    insertOrderItem.Params.Add("@product", "widget");

    insertOrderItem.Params.Add("@quantity", 42);

    insertOrderItem.Connection = connection;

    insertOrderItem.Transaction = transaction;

    insertOrderItem.ExecuteNonQuery();

    }

    transaction.Commit();

    }

    catch {

    transaction.Rollback();

    throw;

    }

    }

    ———————————————

    and:

    ———————————————

    DECLARE @orderId INT

    BEGIN TRANS

    INSERT INTO [Order] ([ForCustomer], [AuthorizedBy]) VALUES ("Acme Inc", "Michelle")

    SET @orderId = @@IDENTITY

    INSERT INTO [OrderItem] ([OrderId], [Product], [Quantity]) VALUES (@orderId, "gizmo", 10)

    INSERT INTO [OrderItem] ([OrderId], [Product], [Quantity]) VALUES (@orderId, "widget", 42)

    COMMIT TRANS — heh, not bothering to do the error handling, T-SQL is painfully verbose

    ———————————————

    The difference is nothing more than how much time it takes to write them. They all accomplish the same thing. In this simple example the RDBMS is capable of all relevant business validation (assuming I will take items for products that exist in my system regardless of inventory levels, etc.). They all rely on transactions (with queues / pessimistic locking) on the ‘server’ side. What we are talking about is infrastructure to enable me to send the XML based message, but we can’t ignore concurrency.

    I’ve been working on an internal O/R mapping tool for the last year or two that works like the simple examples above, flowing from the client OO model, to the message serialization to the imperative server side code or the database transaction (I use stored procs exclusively and generate most of them, and have mechanisms for handling concurrency and authentication/authorization (now termed policy) so my real code is obviously quite different and far more complicated).

    Speaking about applicatin layer protocols, XML isn’t really necessary as the message expression mechanism, although it can be nice. It’s probably still easier to share my client libraries than to declare and support an XML based messaging protocol (SOAP for example) for enabling my clients, and XML is slower than it needs to be, even Base64 is better (and it’s terrible) if I have a .NET API that can talk to my server my clients would probably implement in .NET rather than take on board implementing their own, I don’t know why people make such a big deal out of this, I think it’s politics, maybe training too. I’m not sure that XML based protocols are any more ‘human readable’ than say HTTP or SMTP either, although their built-in ability to define hierachies is useful, and the rigid syntax is helpful for people implementing a parser (seriously though, how many people really need to implement their own parser). Sure we can meet at the application protocol if COM isn’t good enough for you anymore, or if you won’t use my client libraries developed for [insert your development platform here]. Meeting at the application protocol is so old-school though, I’m surprised there is a tend back towards it. I guess, HTTP, SMTP, POP3, FTP, etc. have been successful and that’s what they do, still I’ve been pretty happy to use ODBC libraries like the ADO (implemented with COM) to abstract application protocols like TDS for example. If you develop server software you tend to develop client libraries too, opening up your application protocol for public consumption is non-trivial, and it’s far more difficult to handle backwards compatibility for your clients without forcing them to re-write their client software. For example, if I ship libraries I can maintain the interfaces my clients have come to rely on and re-work my application layer protocol, this saves me having to maintain a ‘version one’ and ‘version two’ server, or manage ‘kludging’ various versions of my protocol into my server software to the performance detriment of upgraded clients and maintenance detriment of my server software developers.

    I can tell you from my experience that ‘schema’, ‘contract’ and ‘policy’ are the least of your problems, these are obvious and easily addressed compared to the bigger issues. The problems are managing ‘distributed transactions’ without issuing locks (optimistic concurrency) and result dispatching (to block or not to block, that is the question).

    I still can’t shake the feeling that, again as Don Box put it, I’m "being sold the same shit again".

    John.

  12. John says:

    Thought I might add that I was particularly impressed by this comment from JohnCJ, but wanted to make a few comments about it:

    "We need to recognize that the authoritative record of the business data has to be in these documents, not in the relational store."

    This is really ‘profound’ (careful now ;). I wouldn’t say this was ‘service oriented architecture’ however, perhaps ‘document oriented architecture’. A part of this message is very important, but part of it is incorrect too. In my view it is information technology’s general inability to deliver on this that is really lacking. It is this that has long been the splinter in my mind.

    The thing about the relational model is that by itself the only real record that it keeps of ‘state’ is ‘now’. When you think about it though, data has necessarily been stored in ‘de-normal’ documents for much longer (hey, computers haven’t been around forever), and these have always served business pretty well. The thing is that these have been supplemented with ‘human’ processing. I.e. the *one* guy in charge of a customer account knows about all dealings with the client and can keep an eye out for inconsistencies so the business doesn’t waste time or lose money, the *one* lady that heads up the accounts department can keep the finger on all transactions, again watching for anomalies, etc. They also waste a lot of ‘space’ because they persist facts that are no longer relevant to the business.

    The trouble arises when we try to get a computer to fulfil this previously human role of communicating business information. For example, many of the concurrency problems and information load that arise in a large distributed computing environment simply don’t exist in a small business. This is generally because ‘one person’ is responsible for processing humble amounts of incoming data and that processing is therefore necessarily synchronous. If that person is competent and not overworked then there are very few problems. For example the the account manager isn’t authorizing a loan for someone at the same time as recording them as a bad creditor. Yes, as the organisations get larger and things ‘scale out’ many people are responsible for the same things, or different things that can still impact each other. Humans (and business) have a lot of trouble here, and look to technology to solve these problems. Pretending they don’t exist doesn’t help in solving the problem.

    The reason that the assertion that "the authoritative record of the business data has to be in these documents, not in the relational store" is at first entirely attractive but after careful scrutiny falls over is that only the first part is correct. It is true that "the authoritative record of the business data has to be in these documents". The ‘truth’ for the business is found in the record of its various transactions, and these documents are that record and therefore reveal that truth. The problem is the second part of the statement "not in the relational store". This is incorrect. The relational store is the correct place to store atomic data. You can fight it all you want because if feels like *work*, but it has been worked on, meditated upon, and fussed over for so long because it is important, not because a bunch of ‘academic’ wankers felt like having a tug. The real truth is that ‘all important facts expressed in business documents are recorded in the relational data store’. This ensures that there is only one authoritive place for any given piece of information. If the date some information was entered or was relevant is important then this fact must be recorded, if not, then it need not be recorded.

    Knowing the aggregate ‘present state’ of the system is important, but generally systems aren’t built so that I can request the ‘exact state’ from say a week ago, or a month ago. The whole thing with the relational model is that you try to get from the current state to the next state in an atomic fashion, but once you get there you have lost your old state (logs, backups and audit or journalling mechanisms aside). So, if the ‘old state’ is important to the business process then it must be accommodated in your relational design.

    As a case in point with regard to ‘business documents’. Consider my bank slips that I get every month (or whatever) from my bank. Say I keep these for two years. They are discreet documents containing de-normalized information. They report a current balance that was correct at the time the statement was issued, my account number, a list of transactions during the period, the address that it was sent to me at, my name and various other information perhaps such as who my branch manager was and how I can contact them or what their mortgage rates were, etc. I really don’t need to keep these documents, safe with the knowledge that the bank is keeping all the data that is important and the rest has probably become obsolete or will over time. For example, if after two years I want to know what my account balance was on 6th June 2003 then I can determine that based on my transaction history. I can ask for a list of all the transactions in May 2002, etc. I probably can’t ask them what address they posted my January 2003 statement to however, because the bank probably doesn’t (they might, but they probably don’t) care about this data and so they haven’t accommodated it in their relational model. The only thing that the bank needs to know about my postal address is my *presently* recorded address. They’ll assume it’s current until they find out otherwise, and but they won’t necessarily care (unless for audit purposes) about what it was in the past or when it got changed. Perhaps it’s not a perfect example (I keep thinking of all the possible, though dodgy, reasons that a bank might be interested in keeping my old addresses). At any rate, the real point is that the relational model is designed to store the data in these ‘business documents’. Yes, the documents are useful for describing business process and then deriving the relational model, but once the business process and the relational model have been defined the relational model really is "god’s truth". If it’s not, then you have to spend a great deal of time explicitly determining where duplicate data is being recorded and defining how this duplicate data is to be viewed with respect to being ‘authoritative’. What relevance does my penultimate change of address form have to either me or my bank?

    In short, the relational model is the best method that the software development community at large has been able to come up with for persisting data and maintaining data integrity. Other methods for storing data and ‘finding truth’ in data exist, but they are more complicated, short-sited and prone to error than the facilities available in the relational model. The relational model is for storing ‘atomic’ data. What your business considers atomic is largely debatable or for you define. For example, a business might consider a postal address as an atomic piece of information if all it needs to do is address letters to you, although it might also have cause to care about more discreet aspects of the address, such as state and postcode, making the address non-atomic from the business perspective. Updating an address will still be an atomic operation however. Address is a crappy example, because if one part of it changes so does the entire ‘entity’, a change of address is really a ‘new address’ given that an address is really a primary key. If a person changes their name however (i.e. gets married) then their birth data does not change. Name and birth date are functionally dependant on the ‘primary key’ but they do not comprise the primary key.

    So, I agree that the authoritative record of the business data has is in the business documents, and this perspective is remarkably useful when defining systems, but I say that this data is best recorded in a relational model, regardless of how the ‘documents’ (copies of data) are created, passed around, or otherwise used.

    As an example of a more complex concurrency anomaly consider a (really basic, somewhat bogus) schema like this:

    Survey (Id)

    Question (Id, SurveyId, Text)

    Answer (UserId, QuestionId, Text)

    I have a conversation like this:

    User to Service: REQUEST Survey:42

    Service to User:

    <Survey Id="42">

    <Question Id="123" Text="What is your favourite colour?"/>

    <Question Id="124" Text="Do you like cheese?"/>

    </Survey>

    Manager to Service: REQUEST Survey:42

    Service to Manager:

    <Survey Id="42">

    <Question Id="123" Text="What is your favourite colour?"/>

    <Question Id="124" Text="Do you like cheese?"/>

    </Survey>

    Manager to Service: POST UPDATE

    <Survey Id="42">

    <Question Id="123" Text="Is your favourite colour green?"/>

    <Question Id="124" Text="Do you like cheese?"/>

    </Survey>

    [Service conducts business processing, ensuring that there are no existing answers for this survey because the semantics of Question:123 have changed, there are none so the requested changes are applied]

    User to Service: POST INSERT

    <Survey Id="42">

    <Answer QuestionId="123">Blue</Answer>

    <Answer QuestionId="124">Yes!</Answer>

    </Survey>

    [Service conducts business processing, everything seems in order, results are inserted]

    Customer to Service: REPORT Survey:42

    <Survey Id="42">

    <Result>

    <Question Text="Is your favourite colour green?">Blue</Question>

    <Question Text="Do you like cheese?">Yes!</Question>

    </Result>

    </Survey>

    Customer to Manager: I’m getting bogus data in my reports! Your users are idiots!

    Manager to User: Read the friggin question! You’re hopeless!

    User to Manager: Screw you! I quit!

    At least if the question had initially been "Is your favorite colour blue?" then the customer would not have been upset, they just would have been viewing bogus data in blissful ignorance.

    Heh, it’s 5AM, I think I’ve made my point about the need for supporting concurrency in a distributed environment. This example is more complex than my previous example, but is still simplistic in many ways compared to real business processes. In this case, conceptionally the ‘Survey’ should have been versioned. This could have been accomplished with a ROWVERSION on each record submitted back as part of the survey response. An example of ‘complex validation’ was the requirement of no existing answers for a question when a question was altered, because you can’t change a question that has already been answered without considering what you want to do with existing answers, perhaps you warn, perhaps you deny, but you need to consider. Perhaps the server could have been created such that it was aware that it had issued a read-lock to a specific user and denied alteration of the survey or responses from the user after alteration, although this data would still need to be recorded somewhere (in the relational model) and is largely impractical particularly in a distributed system that only issues optomistic read-locks (data is a read lock).

    I know from experience that most systems don’t deal with these problems, and that is why most systems are crap.

    John.

  13. Bart Elia says:

    I have so many things to respond to this, I don’t know where to start. If one starts down the road of discussing things in similes and metaphors you lose track of the real problem so I will try to find a balance.

    To the response:

    [Pat] If I (a service) value my data, I’m not going to let others change it. This is about independence, autonomy, and trust. The IRS does not let me perform optimistic concurrency control against their backend database when I post my tax return. The premise behind services is a loose coupling and distrust between the participating services. In my opinion, this means that a read/write semantic against backend data is completely unacceptable.

    I could not disagree more. You submit a revised IRS tax return that says find the primary key(s) (e.g.- My social security number, form 1040) and a timestamp (Tax year 2003) and update the data of interest as follows throwing out everything in the ‘working table’ of business data.

    Likewise…

    [Pat] SOA is about interacting with a business-function semantic. It is also about the assumption that when you do your business function, it is only connected via messaging. This leads us to a style of interaction that is reminiscent of the way we interact with businesses. I may place a hotel reservation (and, perhaps, later on cancel that reservation). I don’t fiddle with the hotel’s backend database records.

    Ummm… same thing. Primary key(s) of my name and a date and perform an update or delete accordingly. Oh, my travel agent already contacted the hotel to cancel for me? Okay, my business process of canceling got a concurrency lock violation because someone already performed an action on the data I was interested in since the time I *read* the data of booking the reservation.

    Sorry, this sounds like a great topic to pay too much to hear at a technical seminar. We keep re-inventing the same problems in the computer industry. We had mainframes with a dumb terminal. We had rich clients that were better clients and performed better because of the distribution of processing. We had browsers because people liked the centralized management of control as was the mainframe. We are now getting *smart clients* that do away with some of the arguments promoting browsers. I am sure the pendulum will swing again back to browsers or the like in another 5 years.

    Those who do not learn from the past are destined to repeat it.

    This is not to say that SOA makes many issues easier to understand. Messaging based systems have been around a long time. It used to be the message was a flat file that was dropped in a certain directory between mini/mainframe computers. Now it is xml over http. SOA is a formalization of an old pattern in new clothes that makes a difficult messaging based approach to system much easier to understand. The physics do not change, just the understandability. And that is NOT a bad thing!! I much prefer dealing with XML than flat files in some arcane type definition. Give me schema!!!

    I guess in closing, I am not discouraging SOA but let’s be real about what it is. A simplification of messaging based systems like never before that allows for wider consumption of them and all their benefits. That being said lets look at the pros and cons and lessons learned.

  14. Pat Helland says:

    Re: http://blogs.msdn.com/pathelland/archive/2004/03/18/91825.aspx#91911

    re: SOA is like the Night Sky… 3/18/2004 10:26 AM Marcus Mac Innes

    Pat, what you are saying is very clear. Please could you comment on what appears in a related MS blog by Ramkumar Kothandaraman’s which seems contradictory:

    http://blogs.msdn.com/ramkoth/archive/2004/03/08/85802.aspx

    Regards,

    Marcus Mac Innes

    Marcus,

    Ram and I are saying the same thing but we (actually I) screwed up the terminology. In my post It’s All in a Name: What’s a Service?

    http://blogs.msdn.com/pathelland/archive/2004/03/11/88058.aspx

    I spoke about “Foos” and “Bars”. Our team (including Ram) has been using the terms:

    Service — (same as a Foo) for the named endpoint for an interaction, and

    Business-Service — (same as a Bar) for the collection of data, code, and Foos that is a disjoint set.

    Looking at the four tenets of an SOA:

    Explicit Boundaries: Really applies to a Business-Service (aka Bar)

    Autonomy: Really applies to a Business-Service (aka Bar)

    Schema/Contract: Really applies to a Service (aka Foo)

    Policy: Really applies to a Service (aka Foo)

    So, Ram we both were saying the same thing but Ram was using the terminology that we had last discussed. I screwed up at the end of my discussion and should have used Business-Service and Service as the terminology. I will post a new blog to this effect.

    Love,

    Pat

  15. Pat Helland says:

    Re: http://blogs.msdn.com/pathelland/archive/2004/03/22/94000.aspx#95000

    First of all, I would like to apologize to John for violating Webblog etiquette.

    I have removed his IP address from my original message since I am allowed to edit it. There is WAY too much valuable commentary in his response for me to want to delete it (the only option I can figure out). Again, please accept my apologies for my error… it was due to naïveté. I won’t make THAT mistake again (just other mistakes).

    I couldn’t figure out how to thoroughly answer this without copying the stuff and inserting my responses point-by-point. Also, I edited the typos for legibility that John reported in http://blogs.msdn.com/pathelland/archive/2004/03/22/94000.aspx#95009

    [Pat-3/22] Refers to the text I blogged on March 22nd.

    [John-3/23] Is the text that John responded on March 23rd (including minor edits he requested)

    [Pat-4/3] Is the new stuff I am adding today.

    Love,

    Pat

    ———————————-

    [Pat-3/22] My assertion is that a distrusting relationship with autonomy WILL NOT include data updates of the backend database (even with optimistic concurrency control).

    [John-3/23] Then your service is necessarily read-only. My assertion remains: data is a read-lock.

    [Pat-4/3] I am trying to differentiate between direct data updates (e.g. CRUD) and data updates that are mitigated by business logic. When I reserve a hotel room, there are data updates but I most certainly am not allowed to whack on the Sheraton’s back-end database with CRUD operations. The Sheraton’s service is NOT read-only. Nor am I allowed to directly update the back-end system.

    [John-3/23] The reality tends to be that if you have data you need to update it. Unless your service is generating that data it needs to source it from somewhere. In order to get data from somewhere you have to leave your execution context (when I say this, I’m talking about ‘serial execution’ or ‘sequential execution’, I’d say thread, but I want a term more generic than ‘thread’, at the end of the day even a thread is only an abstracted execution context, down in the hardware locking, blocking and switching is maintaining the integrity of this ‘abstract serial execution context’ for you, I pulled ‘execution context’ out of the air, but I think it’s a fair enough term), meaning that you need a method for dealing with concurrency. There are really only two ways to deal with concurrency. Optimistic and Pessimistic. I don’t want to get hung up on typical ways for implementing these types of concurrency mechanisms because I’m sure we all know about them, but in general pessimistic concurrency serializes access to data (i.e. one at a time) optimistic concurrency allows for non-serialized access to data (i.e. many execution contexts can be reading and modifying the same data at the same time).

    [Pat-4/3] It is my assertion that there are different kinds of data and that there’s a difference between how data is treated on the INSIDE of the service and how it is treated on the OUTSIDE of the service. When I am interacting with another business I am an outsider. Consider Amazon, they have lots of private data and they have lots of public data. The product catalogs, price-lists, reader’s reviews, and “usually-ships-in-24-hours” information is exposed to outsiders. The exact inventory and the cost/profit structure for the products are kept private (as they should be).

    [Pat-4/3] Furthermore, when I interact with Amazon, I perform business functions to add items to my shopping basket (can’t access the records directly), supply a new credit card (still can’t access the records directly). In all cases, I am sending information off to Amazon which is always processed by business logic before impacting their database. This is neither optimistic nor pessimistic concurrency control but a different style of interaction.

    [John-3/23] Obviously if multiple execution contexts are modifying the ‘same data at the same time’ they are working on copies of the data. There is still "God’s Truth" about the data, but it is (or at least *should be*) well and truly protected by your service. The only execution context that can know "God’s Truth" about the data is the service, and it is responsible for serializing access with a form of pessimistic concurrency (locking, queuing, etc) to maintain its integrity. So, at the basic level a DBMS is a service. You pass it messages (SQL statements) and it ‘serializes’ (as in ensures the ACID principles are met, usually by guaranteeing ‘serial’ processing of queued messages at an abstract level) those for you. The problem with a DBMS by itself is that (not for lack of trying for decades) it doesn’t provide a rich or secure enough service just by itself. It’s ‘easier’ to wrap the DBMS in another ‘service’ where business logic can be implemented imperatively, typical stuff like application level authentication, authorisation, complex validation, etc. A DBMS is essentially a ‘multi-purpose’ environment, and this is a ‘bad thing’ because developers or users can easily undermine attempts at maintaining data integrity (when your DBMS can’t (or doesn’t) model all possible constraints), thus we create a ‘service’.

    [Pat-4/3] SQL is not really a service because the semantics of the SQL language is effectively CRUD. [ I am not religious about stored procedures, they are just business logic deployed inside the database engine. When I speak of database interaction, though, please understand that I am referring to the data-centric functions (SELECT and UPDATE) and consider stored procedures to be business logic.]

    [Pat-4/3] As we delve more into the semantics of service-level interactions, it becomes clear that interactions across distrusting services have a different semantics than classic (optimistic or pessimistic) concurrency control. The reservation/cancellation model is how most of these interactions occur and that is a very different form of concurrency control that is based on the semantics of the resources being managed by the service.

    [John-3/23] In my view, a service is just an ‘application server’. It receives messages and processes them. Simple right? All it does is ‘restrict’ what you could do on the DBMS.

    [Pat-4/3] Restrict and add to what you can do with a DBMS… sounds like the “Subset-with-Extensions” joke I’m used to hearing. The service boundary is providing a very different semantic than a DBMS and that is where you and I are disagreeing with (or misunderstanding) each other. I interpret optimistic (and pessimistic) concurrency control and supporting a CRUD (or SELECT/UPDATE) semantic. That is very different from how I interact with another business (e.g. a hotel or the IRS). SOA is about promoting the distrusting business-function semantic for interaction between the pieces of code. This is different than concurrency control (even though there are some really interesting parallels between confirm and cancel and concurrency control).

    [Pat-3/22] A distrusting service will insist on verifying the behavior implied by incoming work through its business logic. I’m fully aware that optimistic concurrency control can be made to work across distances and without holding locks. The issue is that this is unsafe behavior.

    [John-3/23] Wait a minute. You said ‘optimistic concurrency control can be made to work across distances and without holding locks’. But I said ‘data is a read-lock’. They are contradictory assertions. I’m not willing to change my mind. So, we have a vocabulary problem. If we aren’t talking the same language we can’t communicate (this is actually a very hard philosophical problem that I don’t believe can be adequately overcome, so let’s not over analyse this just now, lets just assume it is possible for us to have meaningful communication (whatever you think that means)).

    [Pat-4/3] I am very comfortable with your statement that data is a read-lock in an optimistic concurrency control environment. I have absolutely no doubt that you and I completely understand both optimistic and pessimistic concurrency control. The issues causing the debate revolve around the types of data shared across the boundaries and what to do with that data.

    [John-3/23] When you think of a ‘lock’ you are thinking of pessimistic concurrency. That’s fine, because that’s a normal and typical way to view a lock. Such a lock is ‘explicit’. You acquire the lock, and once you have you know that you are the only holder of that lock. While you have the lock you can modify the data. The thing about an ‘optimistic lock’ is that it is ‘implicit’ and you have it as soon as you have a copy of some data. If that data doesn’t come with a mechanism for determining the ‘version’ of the data then you can’t have optimistic concurrency. You need a method of knowing when you lose a race with another execution context so you can deal with the problem, this is different to trying to avoid getting into a race in the first place, but it is still ‘locking’ in the sense that it is a mechanism for dealing with concurrency.

    [Pat-4/3] What am I locking when I reserve a king-sized non-smoking hotel room at the Sheraton for Friday night? The Sheraton is clearly doing something to ensure that I have a room when I come sauntering in at 10PM. I would neither characterize this as pessimistic nor optimistic concurrency control. Is there concurrency control of some sort? I say yes. They have a fixed pool of king-sized non-smoking rooms and they want to maximize their occupancy (and their income) for Friday night. This is in the face of people canceling reservations a few days ahead of the 24 hour deadline (which can cause loss of revenue) and people wanting to extend their stay. The real special thing about SOA is that it is dealing with this style of interaction. It is much coarser than the interaction across component boundaries (where shared transactions and optimistic/pessimistic concurrency control becomes the relevant discussion).

    [John-3/23] I guess my argument is that you are using the term ‘lock’ to mean a pessimistic lock, and don’t want to concede my point that ‘data is a read-lock’. In my view I’m right, so change your mind or change mine (the latter might be quite difficult). 😉

    [Pat-4/3] Again, I’m happy to agree that data is a read-lock in an optimistic concurrency controlled style of interaction across environments. I simply arguing that the reason for defining SOA and services is to promote thinking and work around a different (and more distrusting) style of interaction.

    [Pat-3/22] If I (a service) value my data, I’m not going to let others change it.

    [John-3/23] What you mean is: If I value my data, I’m not going to let others change it without going through me.

    [John-3/23] Data still needs to be changed.

    [Pat-4/3] Yup, but that is not necessarily the classic concurrency control mechanisms.

    [Pat-3/22] This is about independence, autonomy, and trust.

    [John-3/23] At the end of the day all it’s ever really about is data integrity.

    [Pat-4/3] Yup. But a big part of that is that I don’t want YOUR business logic CRUDding on MY data.

    [Pat-3/22] The IRS does not let me perform optimistic concurrency control against their backend database when I post my tax return.

    [John-3/23] Posting your tax return is in many ways a ‘write-only’ operation. You provide your primary key (in Australia that’s a Tax File Number) and a whole heap of ‘new’ data. Say I posted my tax return twice, what then? This isn’t a great example, because the operation is ‘mostly’ a ‘write’ operation. I’ll give a better example below.

    [Pat-4/3] Sure… all messaging is really a write-only operation and my interaction with the IRS by submitted my tax-return is really a message (which cost me a lot of accounting fees to create).

    [Pat-3/22] The premise behind services is a loose coupling and distrust between the participating services.

    [John-3/23] Maybe. Once again, so is anything else. My functions are ‘supposed’ to be loosely coupled and not ‘trust’ their input, and so on..

    [Pat-4/3] This is where the “I’ve heard all this before” and “SOA is just objects warmed over” gets to be an interesting discussion. There is a LOT of truth to the fact that object orientation was the big pusher of encapsulation (although the modularity push of the 1970s was significantly there, too).

    [Pat-4/3] My big hang-up on objects and components is that, while they are phenomenal tools and have made huge progress, the notions of encapsulation where frequently not followed through on by the users of the technology. The originators of OO lived in a mind-set which thought of data as in-memory and in-process data. There were big discussions about the evils of using global variables (which I agree is evil). When these technologies where deployed in enterprise environments, they were not always used in a way that provided great encapsulation. The database and its records became the “globals” that allowed the circumvention of encapsulation. Many programmers knew better but many did not. As discussed in other blog comments (see http://blogs.msdn.com/pathelland/archive/2004/03/11/88058.aspx#92937“>http://blogs.msdn.com/pathelland/archive/2004/03/11/88058.aspx#92937 ) the enterprise’s data commonly becomes so interconnected that it is almost impossible to cleave it apart. SOA is about encouraging coarser granularity separation. I am all in favor of implementing services using objects/components on the inside but I think we need a coarser grained style of interaction between the services. You can legitimately say that we’ve all seen this before with Message-Oriented-Middleware, EAI, B2B and things of this ilk. I say absolutely. Still, there is value in trying to unify these efforts and differentiate from components. This debate is about clarifying the differentiation from components (which, at least so far, you are skeptical of).

    [Pat-4/3] This is further confused by the spectrum of emphases by the SOA proponents. Not everyone agrees with me that transactions should not be shared across service boundaries. Indeed, WS-Transactions is in conflict with my personal beliefs. My only defense of the SOA effort and the fact that is not 100% cohesive is that I’m old enough to remember the advent of object oriented computing and the intense debates surrounding that technology. The SOA crowd is really pretty cohesive when you look back at history.

    [Pat-3/22] In my opinion, this means that a read/write semantic against backend data is completely unacceptable.

    [John-3/23] Well then, how do you propose I update the telephone number for my client Acme Inc? My ‘CRM service’ controls all access to data to ensure integrity, it doesn’t support pessimistic concurrency, and it does support multiple clients (aka consumers or users).

    [John-3/23] I have multiple users. I therefore implicitly have more than one execution context. It’s still the same old problem, I could pass messages around like this:

    [John-3/23] Bob to Boss: Hi, I’m from Acme Inc. We have a new phone number.

    [John-3/23] Boss to Bob: Sure Bob, what is it?

    [John-3/23] Bob to Boss: 555-1234

    [John-3/23] Boss to Bob: No problems Bob, I’ll get someone to update this right away.

    [John-3/23] Boss to Jane: Acme Inc’s new phone number is 555-1234, please update our records.

    [John-3/23] Jane to Service: REQUEST "Acme Inc"

    [John-3/23] Sally to Joe: Hi, I’m from Acme Inc. There is a problem with some field on my latest report.

    [John-3/23] Service to Jane: REPLY "Key: Acme Inc, Phone: 555-2212, SomeField: ABC"

    [John-3/23] Joe to Service: REQUEST "Acme Inc"

    [John-3/23] Service to Joe: REPLY "Key: Acme Inc, Phone: 555-2212, SomeField: ABC"

    [John-3/23] Joe to Sally: Is some field ABC?

    [John-3/23] Jane to Service: UPDATE "Key: Acme Inc, Phone: 555-1234, SomeField: ABC"

    [John-3/23] Sally to Joe: No. It’s supposed to be XYZ.

    [John-3/23] Jane to Boss: No problems Boss. I’ve updated the CRM Service.

    [John-3/23] Joe to Service: UPDATE "Key: Acme Inc, Phone: 555-2212, SomeField: XYZ"

    [John-3/23] Joe to Sally: Thanks Sally, I’ve updated your details in our CRM Service.

    [John-3/23] Boss to James: Call Acme Inc and tell them they owe us $1,000,000.

    [John-3/23] James to Service: REQUEST "Acme Inc"

    [John-3/23] Service to James: REPLY "Key: Acme Inc, Phone: 555-2212, SomeField: XYZ"

    [John-3/23] James to 555-2212: Hi, you owe us $1,000,000

    [John-3/23] 555-2212 to James: You must have the wrong number.

    [John-3/23] James to Boss: I rang Acme Inc but apparently we have the wrong number.

    [John-3/23] Boss to James: Hmm, Jane told me she updated it this morning.

    [John-3/23] James to Boss: Yeah, well we all know that Jane is stupid and doesn’t know how to use the computer.

    [John-3/23] Boss to Jane: You told me you updated the system this morning.

    [John-3/23] Jane to Boss: I did. I entered the new phone number.

    [John-3/23] Boss to Service: REQUEST "Acme Inc"

    [John-3/23] Service to Boss: REPLY "Key: Acme Inc, Phone: 555-2212, SomeField: XYZ"

    [John-3/23] Boss to Jane: Well the old phone number is still in the system. You must have done something wrong.

    [John-3/23] James to Helen: Yeah, Jane is hopeless. She always gets things wrong.

    [John-3/23] Jane to Boss: I did it. I promise.

    [John-3/23] Boss to Jane: We’ll have a meeting about this later. Come and see me at 3pm.

    [John-3/23] Helen to Boss: A lot of people have told me that Jane is stupid.

    [John-3/23] Boss to Jane: This is the last time. You’re fired.

    [John-3/23] Or something equally tragic. The point is, a service needs to share data. In this example, each message was handled transactionally, was atomic, met all business rules, etc. we still violated our data integrity however. You can’t have a service like this without optimistic concurrency. You can’t protect your data from your clients when you have to share your data with them!

    [Pat-4/3] I certainly understand the Lost Update Problem and the usage of either optimistic or pessimistic concurrency control to guard against this challenge. There are many more subtle concurrency control challenges that you could have highlighted had you chosen.

    [Pat-4/3] I am arguing that the CRM service will need to export a set of business-functions. One of these is clearly going to be Change-Customer-Phone-Number or something like that. Another could be Modify-Customer-SomeField. Depending on the semantics of SomeField, it may make a ton of sense to allow interleaved updates to the Phone-Number and to SomeField. I would then turn your example from:

    Jane to Service: UPDATE "Key: Acme Inc, Phone: 555-1234, SomeField: ABC"

    Into

    Jane to Service: Change-Customer-Phone-Number, Key: Acme, Inc, Phone: 555-1234

    [Pat-4/3] I understand that the updating of the customer address and/or phone numbers are examples that sound just like you want to have optimistic concurrency control and a customer-record and the granularity for the concurrency update. This is an example where the advantages are more subtle but let’s give it a try. My argument is that the customer-record is something over which the CRM system maintains a public and a private view. There may be internal information about the value of the customer to the company or the company’s opinion of the credit-worthiness of the company. This information is likely only used by certain individuals using specialized interfaces. Within the public data, there may be information about the customer that is not directly updateable. For example, the customer may be a Silver, Gold, or Platinum level customer. To change levels, a certain amount of business must accrue. My frequent-flyer level at United doesn’t become GOLD until I hit 50K miles per year and 1K until I hit 100K miles per year. You don’t just update the field.

    [Pat-4/3] The act of accepting a change to a Phone-Number or a change to SomeField (again hard to speak to the semantics of SomeField) would be a business function. When a service (in this case a CRM service) wants to allow an outsider to propose work, it formalizes the possible work as a service-request. In this case the service-request is a Change-Customer-Phone-Number. This will run business logic that will decide if it is a good idea to perform the change, perhaps authorize the person proposing the change, and will probably perform a semantic which remembers that starting April 3rd, 2004, the new phone number for Acme is 555-1234 but from August 9th, 1988 through April 3, 2004 the phone number used to be 555-2212. To ensure that all of these operations really do occur (and the old phone number and its history accurately recorded), the service will perform the business logic itself, not allowing it to be performed across the network.

    [Pat-4/3] As I mentioned above, this allows for an increased possibility that interleaved requests (where it makes sense for them to be allowed) are allowed. This gets phenomenally interesting when dealing with orders and reservations. I can have an inventory stock of widgets and process requests for widget-orders in which I reserve the stock. If I have 10,000 widgets in stock I can program to allow lots of interleaving orders for 100s of widgets each and cope with some canceling and some confirming and shipping. I enter a world in which I program for the uncertainty of orders that may be canceled while ensuring I have sufficient stock of widgets in my inventory. Indeed, I may get into overbooking (as airlines and hotels do) as a business decision based on the cost of declining a potential order, the probability that I fail to satisfy all orders, and the cost of an overbooking. This is not simply about concurrency control, it is about a higher-level semantic of interaction!

    [Pat-3/22] Again, [SOA is] about the semantics available across distrusting boundaries, not the ability to use optimistic concurrency control.

    [John-3/23] Once again, that’s too simple. I have to establish trust. I have to authenticate and authorize. A service that never trusts anything will never do anything. An interface at any level of abstraction (or in any sense of the word) is a boundary.

    [John-3/23] My complaint is simply that nothing here is new. Messaging is not new. Transactions are not new. Data integrity is not new. Establishing trust is not new. Exposing an interface is not new. Defining a contract as part of the interface is not new. Hiding implementation behind an interface is not new.

    [John-3/23] (When I say interface btw I mean it in the most general sense, generally I consider ‘interface’ and ‘contract’ inextricably tied to each other and inseparable, and I wish the rest of the world did too)

    [Pat-4/3] A higher-level semantic of interaction is both new and not new. Specialized apps have clearly been doing it for a long time. Bringing it to the forefront of how we think about breaking our applications apart is new and is being popularized as SOA.

    [Pat-3/22] If this were about optimistic concurrency control, we would be pursuing a reincarnation of the same behavior. In that case, you would be correct that it is simply inventing a new vocabulary. It is not about the same behavior, though.

    [John-3/23] Once again, if you have more than one execution context you need a mechanism for managing concurrency. There are two ways, optimistic and pessimistic. I suspect you are so caught up in new words that you are forgetting that the concepts are not new.

    [Pat-4/3] Already covered above…

    [Pat-3/22] SOA is about interacting with a business-function semantic. It is also about the assumption that when you do your business function, it is only connected via messaging. This leads us to a style of interaction that is reminiscent of the way we interact with businesses. I may place a hotel reservation (and, perhaps, later on cancel that reservation). I don’t fiddle with the hotel’s backend database records.

    [John-3/23] Nothing here is new. By the way, you *are* actually fiddling with the hotel’s backend database records. I’m pretty sure that if I book a reservation at your hotel then your backend database records have changed. I reckon if I cancel they change too. Having a ‘front-end’ to a database isn’t anything new. Nor is having a ‘middle-tier’ that wraps access to it. I reckon if I cancel and my wife rings up the hotel at the same time to make sure our room is a smoking room (because I didn’t tell her I was cancelling yet, you know, waiting for the right time) then I have a race condition. If my wife wins the race condition then I’ll end up cancelling a reservation for a smoking room. If I win the race condition then my wife will be told that the reservation has been cancelled. That’s optimistic concurrency, each of us had the ‘booking number’ and ‘what we thought we knew about the state of the booking’, my wife probably wouldn’t have asked to change the room for a booking she knew was cancelled.

    [Pat-4/3] When I hear optimistic concurrency control, I hear a CRUD semantic across that boundary. This is probably a big part of the disagreement (which then would be, at least in part, a misunderstanding) between us. Of course the back-end database is being changed but it is being changed by the business logic in the service that owns the data.

    [Pat-4/3] SOA is different than middle-tier logic (and as one of the inventors of MTS which had strong influence on EJB I have strong opinions about this)… Middle-tier logic shared the same transaction as the presentation-tier. That is, you could begin a transaction in the presentation-tier and do multiple interactions with the middle-tier all under the same ACID transaction. I invented that stuff. In SOA, the interactions from the incoming partner are not atomically linked together. This leads to a different kind of weak-atomicity across multiple interactions. As I mentioned above, this shows up as the reserve and confirm/cancel behavior you see when companies interact.

    [Pat-3/22] This is why there’s a lot of excitement about SOA.

    [John-3/23] I don’t know why people are suddenly excited about old ideas.

    [Pat-4/3] They’re not getting excited about old ideas (except that some specialized apps have done this in specialized ways). SOA is new to very many people and the industry is making a lot of progress in working on the generalized principles of loose coupling and how to get more people to understand the principles and how to apply them. I think you are just missing the new ideas as you cast them into the old framework.

    [Pat-3/22] While it has been done before we came up with a new name

    [John-3/23] Yep. New name. Marketing. Buzz-word.

    [Pat-4/3] More than that, SOA offers new ideas to many people. Before OO became popularized, there were thought leaders doing many of the same basic things but not calling it OO. Just because there were nascent examples of thought leaders that called the ideas something else doesn’t reduce the value achieved by popularizing OO and giving the ideas names that everyone could agree on. We haven’t finished that process in the SOA world but we’re making great progress.

    [Pat-3/22] it has not been worked on with the same intensity and with the same hope for broad impact. What you posit for interaction (with optimistic concurrency control over direct access to the partner’s data) is definitely not SOA.

    [John-3/23] I’m not sure what you mean by ‘direct access’, but obviously your service provides data to clients. The service is the client’s interface to that data, and it can pass messages to get it and alter it. The messages are queued and validated. Again, SOA is just a buzz-word that offers nothing new.

    [Pat-4/3] See above.

    [Pat-3/22] The interaction is not about record reading and is not about ACID transactions that span the services ("bar"s).

    [John-3/23] The only way that services can communicate is with ‘messages’ that contain ‘data’. Data is a read-lock. If I have data I implicitly involved in a distributed transaction with the service. So, you do have ACID transactions that span the services. In fact, all your service does is guarantee the ACID principles: Atomicity, Consistency, Isolation, Durability. It does it with distributed transactions that rely on optimistic concurrency. Yes, a single message needs to contain everything needed for an atomic transaction on the server. Anything less granular will be treated as a sequence of acceptable ‘transaction states’ and is fundamentally ‘what your application does’ or ‘what your business logic is’, such state transitions are necessarily not atomic.

    [Pat-4/3] See above.

    [Pat-3/22] In fact, I am trying to point out that SOA is about looseness between the services.

    [John-3/23] Loose-Coupling. Not new.

    [Pat-4/3] I agree that some people have used loose coupling successfully before. Most people are not there yet and the educational process will bring a lot of value.

    [Pat-3/22] The behavior of a collection of services should be identical even if one of them goes up and down intermittently (of course with the exception that the responsiveness of the collections of services is impacted).

    [John-3/23] Atomicity, Consistency, Isolation, Durability. Not new.

    [Pat-4/3] So I’ve been working on OLTP and Database platforms since 1978. That includes being chief architect for Tandem’s TMF (Transaction Monitoring Facility) which is the transaction processing plumbing for their NonStop systems from 1982 through 1990. I would never say that those systems behaved identically when a service is unavailable. Message-Oriented-Middleware (as typified by MQ-Series) does a great job at that. Anyway, I don’t attribute resilience to down time to ACID properties… now ACID properties sure as heck help but there is something more to it than that.

    [Pat-4/3] So I’ve noticed a pattern in this discussion. As features of the SOA environment are described you say “So what, XXXX had that feature”. We’ve talked about:

    — interfaces from components and objects,

    — concurrency control,

    — loosely-coupled deployment,

    — ACID behavior (and my assertion that SOA is different in not preserving ACID behavior), and

    — Asynchrony (and the possible availability benefits gained in a fashion like MQ-Series)

    and we haven’t yet talked about:

    — Schema and

    — Policy.

    In each case, you say “Nothing new!! Why create the new buzz-word?” What environment has all of these features? The fact that the attributes of SOA have been tried before is a feature!!! It means that we know it works!

    [Pat-3/22] The use of queues for the messages that connect the services allows for a great deal of tolerance of intermittent availability.

    [John-3/23] Queuing, Messaging. Now new.

  16. Pat Helland says:

    Re: http://blogs.msdn.com/pathelland/archive/2004/03/22/94000.aspx#95578

    re: More Discussion of SOA is like the Night Sky… 3/24/2004 6:15 PM John Cavnar-Johnson

    JohnCJ makes a great comment that the reality in any database is only as real as the data contained inside it. I think my phrase “God’s Truth” caused JohnCJ’s concern. I totally agree that databases can contain trash. In addition to Garbage-In-Garbage-Out, there are anomalies and logic bugs. I guess I’m hard pressed to see that this is an SOA related issue rather than an issue that pertains to all applications.

    He points out that relational databases (while powerful) need to have application semantics to correlate meaning to the real world. I agree.

    I like his point that looking at SOA through a relational prism can filter out all that is important about SOA. I love JohnCJ’s observation that SOA is about developing systems that more accurately reflect the true nature of business process.

    When I said the phrase “God’s Truth”, I wasn’t trying to express a belief that the database was necessarily accurate (and, hence, I take blame for a poor choice of words). I was trying to contrast the nature of data within the relational engine of the service (which is transactionally as accurate as the relational data can be) with the nature of data which has been unlocked and, hence potentially changed.

    I argue that data that has been transmitted in messages has been unlocked and, hence, is of a different nature than data that can be examined under the transactional lock that participates in the history of the data within the service.

    I was trying to point out that there is a special issue with the need to send the data out in a message and that this is a fundamental semantic that arises in a service-oriented world.

    Thanks for your comments!

    Love,

    Pat

  17. Pat Helland says:

    Re: http://blogs.msdn.com/pathelland/archive/2004/03/22/94000.aspx#96490

    re: More Discussion of SOA is like the Night Sky… 3/25/2004 8:58 PM John

    I am responding to a subset of John’s points…

    [John-3/25] So far, it has been said that ‘SOA’ is new, and that ‘SOA’ is not new. I don’t think that the concepts are new, it’s is what we have been doing for a long time.. my gripe is that I keep hearing arguments or discussion about ‘what it is’ and it’s not anything new, so why don’t we just keep talking about stuff we already know.. it’s still N-Tier development in my view, and in many ways is client-server, since we should really care what is ‘behind’ our services, we only care about their interface.

    [Pat-4/3] You are missing some fundamental points behind SOA when you say that it is still N-Tier. I discussed these in http://blogs.msdn.com/pathelland/archive/2004/03/22/94000.aspx#107212 that I just posted.

    [John-3/25] I like some of the stuff that Don Box said: "If I share an abstraction it has a cost. And the fundamental premise of service orientation is that we try to control sharing of abstractions." But we’ve always known this right? Since I can remember I’ve been taught and agreed that you should limit access to your abstractions, it is how you accomplish loose-coupling, and we know that loose-coupling is a ‘good thing’.

    [Pat-4/3] You are the one that claims SOA is just the same as N-Tier. That says that you don’t understand that SOA is severely more constrained in its intimacy than N-Tier. Transactions, chatty-interfaces, common understanding of the database and its schema, the ability to understand passed references, and much more are shared in an N-Tier environment that are not shared in an SOA.

    [John-3/25] I also like some of the stuff that Don Box said: "Fundamentally there is a broader thing here which is this move towards service-orientation, or called service oriented architecture by, you know, people who want to charge more money for consulting."

    [Pat-4/3] I would prefer to hear Don’s response. I believe (but could be wrong) that Don sees SOA as an important architectural trend, Web-Services as the leading effort to standardize the protocols, and Indigo as the premier environment to develop SOA implementations.

    [John-3/25] So, I’m not the only one that thinks SOA is just a buzz-word.

    [Pat-4/3] I still assert that this belief of yours is based on incomplete understanding what is meant by SOA. I feel the blame for that lies on the folks in the SOA community (including myself) for not communicating effectively.

    [John-3/25] Don has a big thing about ‘type’ and ‘schema’ that I don’t really share. I understand where he gets it from though. I still think it’s OK to say that ‘type’ is ‘schema’, ‘contract’ and ‘policy’. The problem is really only a ‘version’ problem. If I change my class’s implementation I change its version making it incompatible with the previous versions. We need finer grained version, so the ‘schema’ and ‘contract’ and ‘policy’ can change or stay the same across binary versions of my classes that support them. He forces the point because in earlier versions of SOAP the type schema was heavily versioned. This is not a new problem, it’s surprising the extent that it was built into SOAP given the history of this serialization/version problem.

    [John-4/3] In the .NET world the SOAP formatter will encode your ‘schema’ with a namespace comprised of your ‘class and version’. Meaning if you change a class’s version you effectively change the schema, even though the semantics have not changed. This is very, very painful. To use it effectively you really need to create an entire library simply to describe your schema so it can be versioned independently, and there aren’t great tools for this.

    [Pat-4/3] I am not up on this but I would agree with John that changing the schema when the class changes would suck.

    [John-4/3] Generally the types change because of code upgrades that fix bugs to come in-line with the expected ‘contract’ of a type, or to improve performance, etc. There might be a ‘consumer’ type and a ‘server’ type that share the same data but work with it differently (still honouring the implied ‘contract’ and enforcing any ‘policy’), etc. I agree that encoding your data with your type and version is madness, the schema changes far less often. That’s really why the point is so heavily made about ‘schema’. A DBMS is a good example of this because I can change the schema and not break everything that relies on unaltered parts of it, regardless of their ‘version’.

    [Pat-4/3] There is a difference between the back-end database’s schema and a message’s schema. The back-end database’s contents will evolve very dynamically through time. A single message is immutable. Once it is written, it is written and unchangeable forever. Still, you need to ensure that the schema for the messages can change and not break the application.

    [John-3/25] I agree with Pat, most ‘services’ will use a relational backend. They don’t have to, but they typically will (for me anyway). A document management, or source control service may not for example.

    [Pat-4/3] I agree.

    [John-4/3] JohnCJ is right about the ‘goal’ of focusing on business process. Don is right the best method is by *limiting* interfaces.

    [John-4/3] One of my complaints is simply that these have always really been our goals. My other complaint is that of ‘transactions’ not being exposed by services. In the first instance I say that if you have data you have a read-lock (optimistic concurrency), meaning that anyone who can get your data can be involved in a transaction with you, that could span a great deal of time, and a great deal of messages. There are some ‘services’ which may simply *have to* issue pessimistic locks, for example a source control system during a batch merge where the user has to OK or fix several merge operations during a check-in.

    [Pat-4/3] Again, this is one of the major ways in which the SOA community is approaching things differently than you are. This is a strength!

    [John-4/3] The problem with pessimistic locks is what happens if the client is slow or ‘disappears’, some clients are more reliable than others however, I’m not sure that optimistic concurrency is right in every situation (although personally I’d try to use it all the time), I don’t know that if you ‘lock’ at some point that you aren’t necessarily a service, Visual Source Safe as a (terrible) case in point.

    [John-4/3] I don’t want to give anyone the impression that I don’t think service oriented architecture is a good idea, I just wanted to make the point that it isn’t new, and that most of the concepts apply equally to the more fine-grained aspects of your system.

    [Pat-4/3] It is new and it’s different than what you are arguing it is…

    [Pat-4/3] My point about the DBMS wasn’t really supposed to have anything to do with its ‘relational’ aspects. Yes, I think these are important, but I was really using it as an example of a service, it meets all the requirements of a service, its problem is that most implementations allow for too much of some things (mostly too flexible) and not enough of other things (security, complex validation, etc.), meaning that a wrapper is typically useful. The wrapper is obviously your service.. I stand by my point that you could create a ‘service’ simply using SQL Server and stored procedures. You can define schema, contract and policy and expose a limited set of functionality focused on business process. The problem is that generally this would be both inefficient and difficult to implement compared to using imperative code in a front end service, also exposing your service yourself means you can use other more flexible or ubiquitous protocols (i.e. not stuck with ODBC for example).

    [Pat-4/3] SQL Server is NOT an example of a service…

    [John-3/25] As for "God’s Truth", it’s in quotes because it is facetious in many ways. Obviously data in a system is only as correct as users make it (hopefully not any less correct however!). The point is that it is the shared ‘authoritative’ source for the data. If someone knows something that isn’t in there, then they should put it in there.. even this isn’t new: ‘garbage in, garbage out’..

    [Pat-4/3] I agree… see http://blogs.msdn.com/pathelland/archive/2004/03/22/94000.aspx#107215

    [John-3/25] Lastly, I think the ‘night sky’ is a poor analogy for a whole heap of reasons, stars don’t really ‘send requests’ (although gravity is interesting, did you know that there is a concept of a ‘gravity wave’? I used to have a theory that perhaps long-distance instantaneous communication might be possible with gravity, but it turns out that gravity ‘information’ doesn’t travel faster than the speed of light, it’s an inverse-square law, so distance massively diminishes its effects, however, we are actually being attracted to where a star ‘was’, not where it ‘is’, although I guess the attraction is pretty much negligible, I think Stephen Hawking’s has a theory that if something is far enough away from you in space-time it can never effect you, which is interesting (but not really relevant to stars we can see)), because the concepts of messaging are really crucial, and because the stars just ‘stream state’ they are really ‘read-only’ services, which doesn’t flag concurrency as an issue (and it definitely is).

    [Pat-4/3] I was trying to point out that in an environment wherein different services DO NOT share transactions, you can only know what the other service’s data USED TO LOOK LIKE. You can’t know what it IS anymore that we can know that Alpha Centauri is still there.

    [John-4/3] People in a team talking to each other might be though, a person can ask a question or make a statement, they might tell one or more people the same thing, each person will apply a set of rules to decide if they believe something or not, decide what they will do based on messages or requests, and generally just be discreet systems talking to other systems using some common messaging system (i.e. language) trying to get along and accomplish something.. Information will change and propagate in different ways, some people will work with old or invalid data until they find out their data is bad. If someone tells someone something containing old information they’ll probably be corrected, etc. The thing about humans is that they tend to process synchronously, they aren’t generally telling someone something they used to know at the same time they are finding out that it’s not true anymore (and sometimes they just lie, or stay silent, or talk to the wrong person, or are wrong, etc. basically they contain *bugs*, heh, it’s probably a feature).

    [Pat-4/3] You are absolutely correct that data gets more and more distorted the farther it gets away from its origin. This will be even more interesting as we start to deal more formally with the notion of information that was known to be accurate as of a certain point in time. What is the meaning of a set of data calculated based on a set of inputs that were accurate as of different times… lots to learn about!

    Love,

    Pat

  18. StefanG says:

    This is a response to John’s comment:

    http://blogs.msdn.com/pathelland/archive/2004/03/22/94000.aspx#104960

    Really interesting discussion this.

    John seems to say that SOA is nothing new, he has been doing it all for ages.

    Of course he is right in one way; the underlying technologies and principles are not really new, but the

    mindset required for a successful implementation of SOA is at least partially new.

    Let us look at John’s own example:

    <John>

    Survey (Id)

    Question (Id, SurveyId, Text)

    Answer (UserId, QuestionId, Text)

    I have a conversation like this:

    User to Service: REQUEST Survey:42

    Service to User:

    <Survey Id="42">

    <Question Id="123" Text="What is your favourite colour?"/>

    <Question Id="124" Text="Do you like cheese?"/>

    </Survey>

    Manager to Service: REQUEST Survey:42

    Service to Manager:

    <Survey Id="42">

    <Question Id="123" Text="What is your favourite colour?"/>

    <Question Id="124" Text="Do you like cheese?"/>

    </Survey>

    Manager to Service: POST UPDATE

    <Survey Id="42">

    <Question Id="123" Text="Is your favourite colour green?"/>

    <Question Id="124" Text="Do you like cheese?"/>

    </Survey>

    [Service conducts business processing, ensuring that there are no existing answers for this survey because the semantics of Question:123 have changed, there are none so the requested changes are applied]

    User to Service: POST INSERT

    <Survey Id="42">

    <Answer QuestionId="123">Blue</Answer>

    <Answer QuestionId="124">Yes!</Answer>

    </Survey>

    [Service conducts business processing, everything seems in order, results are inserted]

    Customer to Service: REPORT Survey:42

    <Survey Id="42">

    <Result>

    <Question Text="Is your favourite colour green?">Blue</Question>

    <Question Text="Do you like cheese?">Yes!</Question>

    </Result>

    </Survey>

    Customer to Manager: I’m getting bogus data in my reports! Your users are idiots!

    Manager to User: Read the friggin question! You’re hopeless!

    User to Manager: Screw you! I quit!

    </John>

    The example above demonstrates what can happen if a service interface is designed without keeping the fundamental

    SOA principles in mind.

    The problem is that the INSERT request does not contain enough information for the service to determine that there was

    a concurrency violation.

    The INSERT request should really look something like this:

    <Response SurveyID="42" UserID="123">

    <Answer Id="123" Text="What is your favourite colour?">Blue</Answer>

    <Answer Id="124" Text="Do you like cheese?">Yes!</Answer>

    </Response>

    With a document like that, the service can determine that the question text has changed since the user downloaded the survey

    and it could either record the fact that this user has Blue as his favorite color, or (most likely) issue an error so the user can

    get the updated survey and respond to the proper question.

    (Of course, there are other ways of handling this, you might for example make it impossible to change a survey once it has been published)

    One of the main reasons why the SOA concept is important is that it forces developers to think about issues like this.

    When designing a service interface you should always be prepared to handle concurrency issues.

    So, in short: If you design a service interface like the one you showed, you have not understood what SOA is all about.

    /StefanG

  19. Rajiv says:

    SOA is linked to the concept of Web services. Web services can be thought of as a consumer-provider relationship on the Web. The Web service consumer makes a request to the provider and the provider responds