Transactional memory in a real world


(by Sasha Dadiomov) 


 


Isn’t it a common phenomenon that each thing has many faces?  If you were following the transactional memory community for some time, you probably saw it as a pretty theoretical area. There is obviously a lot of science here – discussions about things like “AME calculus” don’t leave much doubt. But there is more to be addressed before transactional memory becomes useful in practice…  Happy Users = Great Theory + Good Implementation + Something More.  This “Something More” is what I will try to explore in this blog.


So, imagine you have a wonderful implementation of Transactional Memory: correct, safe, scalable and quick. Will it be picked up by the world’s developers?  Well, let’s pretend to be one of them.


On day one, you wrote a few samples, played with performance, and became excited. On day two, you started to use it in a new feature of your current hairy project.  I bet your new feature will not work perfectly after your last keystroke from yesterday’s frantic coding session… well, how about debugging it?  Here, a naïve vendor of transactional memory may let you down. First, there may be multiple re-executions. Do you want your debugger to step through them all? And don’t forget that, with optimistic concurrency control, state before validation, which typically happens at the end of the transaction, may be inconsistent. It means you will see values that just don’t make sense. Yes, eventually TM will notice it, roll back the state and re-execute, but you don’t know it yet… you are sitting in front of a debugger, seeing x equal to 1 right after the previous line x=2.  Isn’t it confusing? Even if you were warned about optimistic concurrency control, you may find it hard to reason about your program with this distorted mirror…


Well, this problem can be solved. The debugger may be taught to validate a transaction before stopping at a breakpoint or stepping, so you will not see inconsistent state. But it means that the TM package must include a modified debugger. By the way, there are more things to fix in the debugger. For instance, shadow copy-based TM implies that there will be multiple per-transaction states of the data, so the debugger must show the correct one. Consider debugging a program with two threads: in thread A with no transaction, the value of x is 1; but thread B’s transaction may be setting x to 2. When you view the value of x under the debugger, you want to see 2 when stopped in thread B, but probably 1 when stopped in thread A (showing 2 is problematic, since transaction B haven’t committed yet – what if in the next re-execution B’s transaction would set x to 3? Changes should be invisible outside transaction until commit). Even validating transactions in the debugger is not enough: it additionally has to navigate between global and in-transaction states to show correct values. An alternative solution would be to stop all threads except one while debugging, but this is like looking for a wallet under the lamppost – you don’t want to change the real scenario for the sake of debugging ease. 


OK, the previous paragraph was intended to show that the TM vendor has to deliver a debugging solution with it. The same may apply to a profiler: while a sampling profiler will probably work as usual, the instrumenting one may get fooled by re-executions. And there are new all-important questions now – e.g. which variables cause most of the conflicts? These are your contention points – you probably will change your design to ease the bottleneck. Most of execution time may go into re-execution because of a single variable being modified by several threads concurrently, but how do you find it?


And what about tracing?  Do you want to see traces from all re-executions, or only from the valid ones? Do you want to see traces with inconsistent values, or they will just distract you attention from the real problems when you read the trace? Do you want to see events with inconsistent data?


Well, the picture is clear… changing the fundamental behavior never goes unpunished: the whole ecosystem of tools needs to follow the change. Tooling is the first element of “Something More”.


Now let us look from a totally different perspective: how do atomic blocks co-exist with other facilities used in your program? 


For starters, what about your old traditional transactions (e.g. MSDTC or System.Transactions in Windows)? After all, these were in wide usage before transactional memory inventors went to school. World developers know what transactions are; it would be utterly misleading if TM transactions would be different. It would be also pity if these two kinds of transactions were not usable together. Both strive for atomicity and isolation: everything inside a transaction happens all-or-nothing, and the world is isolated from intermediate states. The difference is domain: TM serves memory, while traditional transactions typically serve database, queues, files, and so on – but usually not memory!  Why? Would it be beneficial for you to have a guarantee that all your operations inside of a transaction are atomic and isolated – database, networking, memory, etc.?


I personally think it could make programs much simpler by eliminating the need to process all possible failure combinations. Imagine a program that executes some algorithm in memory, but also needs to keep a reliable and consistent reflection of the memory elements in a database. Your code will change data in the memory and in the database; you want them to be always equal, e.g. you cannot tolerate non-logged memory changes. To achieve it, you will need to roll back memory changes in case of database failures. But “roll back” smells familiar… isn’t it what TM knows how to do? And aren’t system transactions designed to serve multiple resources?  And why memory cannot be one of them? Well, it seems we will miss a lot of potential benefits if we don’t integrate these technologies. Moreover, TM without integration with system transactions runs a risk of confusing people, who will keep asking “which transaction did you mean”?  TM vendors will probably have to avoid the word “transaction” altogether… keeping in sync with the programmer’s mindset is another part of the “Something More”. Please note though that it would be bad to sacrifice performance of the pure memory scenarios for the benefit of a wider ones; it makes implementation even harder… it should work optimally by itself and “better together” with system transactions.


OK, we started speaking about co-existence with other features. Let’s look now at numerous I/O calls scattered around your program – or more widely, any calls with side effects (e.g. killing other process). What if your newly introduced atomic block includes some of them? Well, naive TM could possibly re-execute them, e.g. repeat your printf multiple times. Even worse – since it is possible now for the program to run on inconsistent (read: random) data, awful things may happen… for instance “if(x)FormatDisk()” may be actually called with random value of x! This is well-known TM problem, and any TM has to take care of it. Usually “care” is a red tape: just forbid it. Very limiting… makes one doubt whether he/she wants to use TM… so next variant is “irrevocability”: whenever a transaction tries its first I/O, it switches from speculative mode to a global lock – no re-executions are possible anymore. The cost is serialization – all other transactions in the system will be required to take the global lock, or be stopped. This approach is viable, but probably should be the last line of defense since it hurts scalability so much. Integration with system transactions opens several better avenues, utilizing the resource manager’s ability to postpone or roll back resource manager actions (e.g. easily defining a custom resource manager around the necessary side-effecting actions). Helpful would be also the ability to just defer some actions for commit time, or register compensations for the rollback case. And any approach will require some checks – which APIs are callable under transaction, and which should be forbidden.


Well, we mentioned a lot of stuff which a shipping TM has to address.  There is more, of course; what I wanted to show was just another face of transactional memory, the “outer” one – its integration with the development and debugging environment and with existing concepts and bodies of code; compatibility with environment is that mysterious “Something More” from the beginning of this blog.

Comments (10)

  1. Alex Baldassin says:

    I have two comments. The first one is about I/O calls from inside transactions. It is not my main focus of work but at first sight it seems that they shouldn’t be allowed since we are performing transactions on memory (that’s why it’s called transactional memory, after all). Even if you consider that most I/O operations are buffered, the problem still persists.

    My second comment is actually a question. Have any of you read the experiences related by a group from IBM about STM ("STM: why is it only a research toy?" ACM Queue)? I talked to one of the authors (Siddhartha) in a recent conference and he looked quite skeptical about the future of both HTM and STM. Although this post discusses some related issues, how do you guys see the future of STM in more general terms? Or in other words, have you found an application domain where STM really seems promising in terms of programmability and/or performance?

    -Alex.

  2. Sasha Dadiomov says:

    Hello Alex,

    Let me answer the first of you questions (about I/O); we will answer the second one separately.

    Yes, in the academic settings transactional memory is usually associated with memory only. But it does not seem to be sufficient in the practical applications of TM: in many real scenarios, memory actions go along with logging, communication, database operations, and so on. Not always you can easily separate memory changes from these actions; we also think that in many cases you actually don’t want to separate them at all. Failure atomicity of the combined memory / non-memory operation may be very desirable reliability feature – it makes error processing much more straightforward. For instance, if your program controls some real process or the game, it probably has some model description of the process in memory, and it probably wants to keep the description in sync with the reality. It would mean that sending control signal to the process and reflecting it in a memory should be parts of the same transaction.

    Thanks, Sasha

  3. Sasha, I do agree that it’s not realistic to forbid I/O operations from occurring transactionally.

    The problem with I/O operations (or side-effects in general) is that they don’t fit the TM model. And that’s because you cannot automatically just rollback their operation if you need to. That’s particularly true for inputs. You cannot ask for user input inside a transaction because if it is rolled back the user will notice that.

    For output actions you may attempt to defer the operation or create compensation code but it’s a bit unclear to me its applicability in the general case. Moreover, it’s highly likely that programmers will have to take part in devising compensation code and/or specifying whether the operation should be deferred to commit time. I am afraid programmers would prefer not to use TM at all in that case.

    Taking into account the ‘programmability’ aspect, maybe it’s better to execute the transaction in a non-speculative manner. Note that in this case you can still allow other read-only transactions to proceed in parallel.

    Regards,

    -Alex.

  4. Kimo Crossman says:

    Interesting post on STM and mutability (functional vs imperative) and why new languages are needed as we move to 100 core scenarios.

    http://enfranchisedmind.com/blog/2009/01/02/the-problem-with-stm-your-languages-still-suck/

  5. Sasha Dadiomov says:

    Hello Alex,

    Let me try to answer your latest comment… yes, if one takes TM idea to the extreme and seeks to use it indiscriminately all over the program, assuming that it just “magically” solves all sharing problems, eliminating any need for special considerations – it will probably not work well today… so if that is what you meant, I agree. But I meant a different situation: the user who has a clear understanding of a shared state elements in his/her design, and remembers it while writing the code. This kind of user will avoid unnecessary side effects; but in those (relatively rare) cases when he/she does want to have side-effecting operations inside of the atomic block (one example can be desire for failure atomicity), we hope to make it possible. In some cases it will just work, for those I/O operations that are already transactional – e.g. database like SQL Server, or messaging system like MSMQ – these can directly participate in system transactions, integrated with TM. In other cases, the developer may use an easy-to-use tool for enveloping his/her I/O in the transactional resource manager, or just defer/compensate I/O via TM-provided API. Clearly it does not make all cases easy; but we hope that most of them will be.

    Non-speculative mode is another way to allow I/O, but it severely restricts scalability and seems to be the last line of defense… And I agree that input does not fit the TM model; but we think pure "output" scenarios can fit usefully.

    Thanks, Sasha

  6. This week’s theme is functional programming.  Included are discussions on Software Transactional

  7. Andrew Dinn says:

    Nice post Sasha. You are right that the tool chain needs to keep up with the programming paradigm. As Danny Bobrow used to say, you have to be able to debug at the same level of abstraction at which you program.

    But the bigger implication is that the programmer has to keep up with the paradigm too i.e. has to learn mentally juggle the different views of what the various parallel threads in the program are seeing. Locks avoid this issue, albeit at great cost, by introducing locally sequential regions. If you want the performance benefits of not imposing this sequencing the you have to pay the price of being able to think parallel.

    Having been involved in implementation of both a HTM runtime and a conventional transaction service I tend to disagree with your view that STM and conventional hih level transactions (HTX) transactions need to be integrated. Firstly, I think this would be impractical without a fully transactional infrastructure through the whole runtime environment. But even given that I am not sure that it is desirable.

    Implementing an HTX in a transaction service (including also the the thin layer which implements the client/participant component of the TX in a distributed service) requires making permanent various details of the ongoing transaction state (that’s the D part of ACID). If those changes cannot be made permanent because they are subject to completion of an STM transaction (STX) allied to the HTX then you need to have some way of retaining this HTX state beyond a system crash. So, the transaction service itself needs to perform actions which have effect outside of the STX allowing it to recover the HTX in a new STX ceated when the system restarts. This also applies to implementations of participants such as databases, transactional web services etc

    That’s not to say that the transaction service cannot make good use of STM to implement itself. It’s just that forcing every HTX to correspond 1-1 with a specific STX will hamstring the HTX implementation.

  8. Sasha Dadiomov says:

    Hi Andrew, thanks for the post. Let me make myself more clear about what I meant for system transactions integration.

    I never meant that EACH system transaction will make ALL memory operations in its scope atomic – this would be a bad idea. But I believe there will be enough cases when the developer wants that, and expresses it by explicitly by requesting a combined “system + STM transaction”, or nesting these two; the example could be maintaining reliable memory cache of some persistent state. To integrate STM with system transactions correctly, we need to make sure that on retries caused by the memory collisions every resource manager is rolled back (it surely knows how to do it because it participates in the 2-phase commit protocol). On the contrary, failure of any other resource manager is final and no retry is attempted (of course, everything is rolled back, including memory).

    Developer will get ACID property for all relevant resources, though only “AI” for memory (D” has no sense for memory). All this is possible because all transactional resources can be rolled back. The benefit – developer gets failure atomicity across bigger set of resources, and avoids combinatory error processing. The application code may become shorter, cleaner, and more reliable, as it did with the advent of database transactions.

  9. Ihar Bury says:

    I was thinking about parallel nested transactions. They seem for me a fundamental feature that allows really deep parallelism that is not controlled from a single method of code. That allows one atomic calculation invoke code that is parallelized itself further abstracting from the atomicity and parallelism of the calling code.

    However, the behavior of parallel nested transactions discussed in the http://www.cs.rochester.edu/meetings/TRANSACT07/papers/agrawal.pdf article seems strange to me.

    1 cilk void foo() { atomic { b++;… } }

    2 cilk void bar() { atomic { …,b++ } }

    3 cilk void proc() {

    4    atomic { // transaction X

    5        b++;

    6        spawn foo(); // transaction Y1

    7        spawn bar(); // transaction Y2

    8        b++;

    9        sync;

    10  }

    11 }

    "When Y2 increments b, there is a conflict with Y1, which is

    running in parallel with Y2 and modifies the same variable

    b. Since Y2 is nested inside X, however, Y2 does not conflict

    with X, even though X also modifies b."

    Why do we want "b++" at line 8 to execute in parallel with "b++" at line 2? Isn’t the result of such execution unpredictable and hardly useful? For me it seems much more natural to write the code like

    1 cilk void foo() { b++;… }

    2 cilk void bar() {…,b++ }

    3 cilk void proc() {

    4    atomic { // transaction X

    5        b++;

    6        spawn foo(); // transaction Y1

    7        spawn bar(); // transaction Y2

    8        b++;

    9        sync;

    10  }

    11 }

    and have it automatically working as

    1 cilk void foo() { atomic { b++;… } }

    2 cilk void bar() { atomic { …,b++ } }

    3 cilk void proc() {

    4    atomic { // transaction X

    5        b++;

    6        spawn foo(); // transaction Y1

             atomic {

    7            spawn bar(); // transaction Y2

                 atomic {

    8                   b++;

                 }

             }

    9        sync;

    10  }

    11 }

    Thus treating spawn as an execution fork that automatically starts 2 nested parallel transactions — one in the parent thread and one in the child thread.