So, I occasionally ponder the D in ACID transactions and wonder what it REALLY means.
Observation #1: Committed Subject To…
When I was working at Tandem in the 1980s, we had a complex (and fascinating) multiprocessor system with dual-ported disk drives, dual-ported IO controllers, multiple (2 to 16) processors connected via a message-passing bus and it was fault tolerant. I’m very proud of that part of my career when I worked on TMF (Transaction Monitoring Facility) which implemented the transaction logging and recovery spread across the Tandem Computer’s NonStop system. While working there, I first realized the spectrum of commitment that happens over time. When a transaction would commit on TMF, it would get progressively more committed over time:
- Committed subject to App process failure. When the call from the application processor to the OS on the local processor (one of the 2 to 16 in the multi-processor) is made, the application process can then fail and the transaction will still commit.
- Committed subject to processor failure. One of the first things TMF does when processing a commit is to send a packet to its next neighbor indicating the intention to commit the transaction in question. Once that packet is received by the neighboring processor, the failure of the application’s original processor would no longer result in the transaction being aborted.
- Maybe committed subject to system (multi-processor) failure. When the commit record is written to the tail of the log, the mirrored writes are (or were back in my day) always done sequentially to ensure that the rewrite of the last block did not lose the previous contents of the last block. So, you went from neither mirror having the commit record to one mirror having the commit record… we haven’t yet written the second mirror. A crash at this point would result in an attempt to read the tail of the log after restart. It was the luck of the dice as to which mirror was read but whichever tail was read, that was rewritten onto the other mirror to ensure a consistent opinion of the tail of the log. Hence, we’ve entered a window in which the transaction has a 50% chance of being committed if the entire system crashes and restarts. Is this transaction durable???
- Committed subject to system (multi-processor) failure. When the commit record is written to the tail of the log on the second mirror, you had a (close to) 100% chance of having the transaction remain committed in the face of a system crash and restart.
- Committed subject to the destruction of the data-center. This would occur when the log was shipped offsite either via tape or by squirreling it across the network to a backup site.
- Committed subject to thermo-nuclear exchange. Now, THIS one was a tough one. You have to send the backup tape to sit under some mountain to ensure the transaction is REALLY committed if this happens… I’m not sure we achieved this…
So, when is the transaction durable? What about a transaction whose commitment is recorded on multiple machines but only in volatile memory (but with enough of them to have as many nines availability as you want)? Hmmm… this durable stuff is annoying (just like all the letters in ACID).
Observation #2: Commit Dependencies and the Event Horizon
I am reminded of IMS (IBM’s Information Management System) and its special version called IMS Fast Path. What I’m about to say may be apocryphal but this is how I think it works based on occasional conversations with transactional old-farts…
As I understand it, there was this cool optimization called “internal commit”.
IMS worked as a database management system integrated with 3270 block mode terminals. Work happened in three steps:
- The user filled out the screen and pushed “enter” which caused the image of the changed fields to be sent to a terminal controller. The terminal controller interacted with IMS to ensure the input image of the screen was transactionally recorded. Only after the input was logged would the work of the transaction against the database be performed.
- The incoming data was dequeued, the work of the transaction was performed (which likely modified data in the database), and the output screen image was enqueued.
- The output screen image was flashed up for the user to see and a transaction performed to delete the output screen.
This has the characteristic that it is IMPOSSIBLE to see the contents of the database other then by looking at the effects of a committed transaction. Unlike most DBMS systems today, you could not begin a transaction, examine record-X, and display the results without committing the transaction that LOOKED at Record-X. Just the act of reading required a commitment to see the effects outside the system.
So, now what the heck is an internal commit? The idea is to count on the single log buffer in this mainframe based system. If transaction-T1 is committed and its updates placed into the transaction log in memory (but not flushed to disk), and transaction-T2 comes along while the system is still running, T2 cannot commit unless T1 commits. T1 will commit if the system remains alive long enough to flush the transaction log to disk. T2 is running on the same system and can only commit using the SAME log and assuming the system stays up long enough for T2 to get its changes written to the log on disk. If T2 succeeds in doing that, T1 most definitely will have committed! This is called a commit dependency –> T2 has a commit dependency on T1.
Leveraging these two concepts (the commit dependency and the fact you can’t see any effects outside the system other than through a transaction commit), IMS-FastPath would play this cute trick. Say transaction-T1 modifies record-X and is on its way to committing. Once the commit record is in the log buffer in memory, then Record-X could be unlocked. This would be heretical in most systems because a crash might cause the loss of the new value for Record-X (remember, transaction-T1 is not yet durable when it is unlocked). This worked without anomalies because of what I call the event horizon.
An event horizon (my terminology) refers to the ever increasing scope of knowledge and our ability to leverage knowledge with some assumptions about its propagation. It is OK for transaction-T1’s changes to record-X to be unlocked because no one can tell the difference! If you see the new value for record-X, you fate is lashed to the success of transaction-T1. You have a commit dependency on transaction-T1 (either because of the design constraints defined above OR because transaction-T1 is actually committed and durable on disk).
So, for transaction-T2 which is looking at transaction-T1, it appears durable when the changes are in the buffer in main memory due to the event horizon effect. It’s like that spy movie: “I could tell you but then I’d have to kill you…” The knowledge doesn’t matter if all the impacts of its use are eliminated from the system.
Observation #3: Dialog Semantics and Visibility of Failures
I spent a couple of years working on a feature of SQL Server (shipped in MS SQL Server 2005) called SQL Service Broker. Service Broker defined a notion of a dialog which implements transactionally-consistent, exactly-once, in-order messaging between to “services” which are reified by their state in the database. The notion of dialogs was that the messages would be delivered transactionally, exact-once, in-the-order-sent, within a timeout window OR a dialog-failure was delivered to the service. SQL Service Broker ONLY provides services whose state was represented in a SQL database and, hence, counted on the durability guarantees of SQL Server.
As you try to understand the semantics of message delivery guarantees, it is essential to think about WHO is being provided with the guarantee and what THEIR durability is. The more we thought about this, the more it was clear that the dialog failed precisely when it LOOKED like it failed. Consider Service-A in a dialog with Service-B. The dialog has a timeout. If Service-A cannot receive a message before the timeout, the Service Broker must give it a Dialog-Failure message. The guarantee of delivery of this message is null and void if Service-A is not around to receive the dialog failure…
While in Service Broker within MS SQ Server 2005 only fully durable services are supported, it is meaningful to support more permutations. Consider Service-A and Service-B, each of which may be either durable or in-memory. The durable flavor means that receiving and/or sending a message occurs only when the change to the service happens on disk in the database (just like changing a record in SQL Server). The in-memory flavor means that the state is just kept in memory, a system failure wipes out the state.
Consider four permutations:
- Durable-Service-A has a dialog with Durable-Service-B. In this case, we don’t have to worry about one end surviving and the other not [well… more discussion below].
- Durable-Service-A has a dialog with In-Memory-Service-B. Now, if Service-B crashes, the dialog will time-out, causing a dialog failure delivered to Service-A. Service-A knows that the flaky In-Memory-Service-B hasn’t communicated within the time-out window and it was do the appropriate thing based on the death on Service-B.
- In-Memory-Service-A has a dialog with Durable-Service-B. Symmetric with case 2 above…
- Both Service-A and Service-B are In-Memory and are connected with a dialog. Here, we have two interesting sub-cases. First, if Service-A and Service-B are both in the same failure unit, a failure which wipes out BOTH of them has no end-point visible semantics. Neither of them are around to see the damage! Second, if one is on Computer-A and the other on Computer-B, it is entirely possible that one lives and one dies… The time-out on the dialog provides the semantics of failure (and is delivered as a “dialog-failure” message transactionally delivered to the survivor(s) ). Indeed, the notion of transactionally delivering to an in-memory service is fascinating. Really, we mean delivered with a durability as great as the durability of the observer!
Wow… so this leads us to an interesting observation. In case-1 (durable-to-durable), what if one of the two services is in a triple-data-center redundant high-availability site and the other is on a laptop? It is possible (actually easy) to lose the laptop and have the fact that it is durable on disk be irrelevant. So, what are the semantics of the dialog? What you know is that the partner service (on the other side of the dialog) either responds in time or doesn’t. If the partner DOES respond and complete its part of the work (and finish the dialog), the partner may be blown to smithereens and you think you have done some cooperative work. Yet, the partner (and its part of the work) are gone! In case-4 (both in-memory), also has the dilemma that you don’t know ANYTHING about the partner except if has sent you the correct messages. Basically, the only thing you know in ALL the cases is if the partner responded in a timely fashion. Once the work is completed, you have no REAL guarantee that it has stuck and won’t be blown up.
During the dialog, you have a relationship that can detect the failure and amnesia via the time-out and dialog-failure. After the dialog successfully completes, you are disconnected and cannot tell if the partner’s state (and work) are lost in a failure. This is intractable and is really a spectrum of durability (from in-memory to “committed subject to thermo-nuclear exchange”). You have a programmatically visible relationship and then you don’t!
Durability Is in the Eye of the Beholder
So, let’s consider this proposition that “durability is in the eye of the beholder”. Who cares if a transaction is durable?? Remember, I am not (in this blog entry) questioning Atomicity, Consistency, or Isolation. If you see ANY effect of the transaction, you see ALL the effects of the transaction. Why did we have this D thing in ACID, anyway??
Well, it turns out the old-farts (including me) doing the old-time transaction systems just kinda’ assumed a special case for interacting with the human. We knew that we needed to tell the person using the system “OK… We did it!” but we didn’t talk a lot about this being an example of a messaging relationship in which the human is one participant in a two-party messaging pattern. Also, as I am about to discuss joint failure and the destruction of state at both ends of the messaging relationship, it is slightly uncomfortable to talk about the human being obliterated in the same way as I am about to discuss the annihilation of a communicating service. I’m really nicer than that…
If we assume atomicity and/or a long-running relationship like a dialog, there is a window of time during which you can programmatically tell if the remote work is lost in a failure. Once you exit that window, your make assumptions about the remote work persisting (being durable) even when you are NOT keeping tabs on it. There are cases in which the loss of the remote work only occurs when YOU are wiped out… those are convenient because you aren’t around to notice the problem. There are other cases in which we just ASSUME that the probability of loss of the remote work is low enough that we ignore the dilemma.
Basically, big, complex, and distributed system are big, complex, and distributed. We can’t get perfect behavior out of them. Something needs to be durable only if I, the partner, am still around to notice the need for it to durable!