Welcome to the Transactional Memory Team’s Blog!


If you have been using the Parallel Extension CTP or simply writing multi-threaded code yourself, you probably have run into situations where you needed to share data between multiple threads.   So long as the data is read-only, this isn’t a problem, but what about mutating data?


The easy answer is to use a lock.  There are a lot of blog entries and white papers talking about how to use locks correctly, how to avoid deadlocks, or what are the best locks for the particular scenario, or even how to correctly write lock-free code. You could read all of these and still run into trouble using locks.  You see, the problem isn’t sharing one piece of data; it’s when you are sharing multiple pieces of data – for instance data that has a complex schema involving multiple complex objects such as trees or lists. 


So, locks are a basic tool in your arsenal.  From the simple lock, you can build synchronization mechanisms that hopefully protect your data, correctly, and don’t impact your scalability.


Ah, I hear you sigh.  Yes, hope is eternal, software has bugs and multithreaded software has race conditions, deadlocks, and scalability problems.  Why?  Well, because many find it is hard and fraught with peril to correctly use anything more than a single lock or at most some really small set of course-grained locks.  As code matures, locking hierarchies to provide fine-grained-locking often morph from elegant to clumsy.  You may also find that as your project grows, lock depth blossoms, unnecessarily, or alternatively race conditions are introduced simply because programmers were unaware that it necessary to lock a specific resource.  The end result is code that simply doesn’t scale or your application’s reliability plummets without some of your best and brightest spending time tuning, fixing, “right-sizing” and eliminating locks.  Even after all that work, are you confident that your code is bug-free?  Do race conditions exist in it?  How can you even inspect the code and reason about its correctness?


So, I suspect you are living with some rather course-grained locks or with fear.


Many of my coworkers argue quite persuasively that you should get rid of locks by decomposing the data such that there is no sharing.  They point to the use of “agents” and “messaging” as a good programming paradigm to ensure that data accessed by a specific thread is logically isolated from other threads.  This is a great method but can you re-architect your code to do this?  Can you use it over multiple data structures?  Can you compose them together?


The jury is still out, but I suspect that not all programming patterns in traditional object-oriented languages can be decomposed to a point where there is no sharing.  I find that advocates of the agent model still have concurrency problems!  They have shared data within their agents and they are hampered by either requiring one thread per agent or to use locks or interlocked operations to manage this shared state.


What would you say if I told you that you could reason about your code in a sequential fashion and be relieved of worrying about other code interacting with its data?  Oh, let’s say it’s in a way that scales?   Would that help you?


Well, that is what transactional memory (TM) promises to provide.


Transactional memory is not about “removing locks” but is about abstracting away the requirement to specify a particular lock.  Instead, you can structure your code in well defined sequential blocks of code, what in the database world we call “units of work”, and then let the underlying runtime system, compiler, or hardware provide you the guarantees you desire.  Further, you want this work to scale.  To do that, the underlying system provides concurrency control optimistically.  Instead of always locking a resource, the transactional memory system assumes that there is no contention.  Instead, it detects when these assumptions are incorrect and rolls back changes that were made in the block.  Depending on the implementation, the transactional memory system may then re-execute your block of code.


In this way, transactional memory provides an execution pattern of multithreaded code that can be reasoned about sequentially and when there is very little contention, scales well.  What about when there is contention?  Well, “it depends” will likely be the answer.  How big the transaction is and how often it re-executes (due to contention) changes the performance curves significantly.  Transactional Memory systems usually include some code to manage contention.  That contention manager will significantly impact how code scales under contention.  We would hope that the resulting performance is not worse than fully serialized execution.


So what does this look like?  Various experimental TM systems have been produced.  Some integrate with the compiler, others, like SXM, provide library support.  When I talk about TM and show examples I prefer to use a simple syntax which delineates a block of code as “atomic”.  For instance:


                Atomic {


                                myBigCollection.insert(someStuff);


                                if (condition)


                                                otherCollection.mutate (myBigCollection);


}


 


So, in this example I have two collections and some operation that may optionally work on one depending on the other.  Obviously, my code is demonstrating composability over these two collections.  What might not be so obvious is that I didn’t have to catch any errors here – my code completed successfully or an exception was thrown and nothing was done.  How would I do that with locks now?


 


                Lock (myCollectionLock) {


                                Bool a = false;


                                Bool b = false;


                                try {


                                myBigCollection.insert(someStuff);


                                a = true;


 


                                if (condition) {


                                                otherCollection.mutate (myBigCollection);


                                                b = true;


                                }


                Catch (exception e) {


                                If (a) myBigCollection.remove(someStuff);


                                If (b) myStaticStuff.unmutate(otherCollection, myBigCollection);


// I probably had to write my own unmutate


Throw e;


                }


} // lock


 


Which one do you want to maintain or test?  Oh, did you catch that I had to write my own ‘unmutate’?  Ugh, I hope that is possible.  Also, did you note that I had to keep the lock through the catch?  If I didn’t, there would be a race condition when a failure occurred and the work undone.  Heck, just setting the flags is error prone.  Consider what would have happened if I set “b = true” but forgot to correctly enclose the “if (condition)” clause.


Could your test team find this bug?



                                if (condition)


                                                otherCollection.mutate (myBigCollection);


                                                b = true;



 


This blog is about transactional memory with entries written by the team at Microsoft that is implementing an experimental transactional memory system in .NET.  While we work in the Developer Division’s Parallel Computing Platform product group, we are not working on a product release at this time.  Instead we have been incubating, researching, and experimenting with transactional memory. 


Our goal is to create a TM system that lives up to its promise.  This work has lead us to what we think is some industry-leading work – but we have been so busy doing it that we haven’t had a chance to talk about it; share what we have found; and more importantly hear what the community thinks of this effort. 


This blog is our first step to remedy that.  Welcome to the Transactional Memory blog here at MSDN.

Comments (12)

  1. The Parallel Computing Platform team at Microsoft is working on much more than Parallel Extensions to

  2. JudahGabriel says:

    Wow. If you guys could pull off something like that, it would help us greatly with our server scalability.

    Looking forward to seeing what you guys come up with.

  3. barrkel says:

    How are you going to handle the composability problem of STM? If you have several transactional memory locations, modifications to each of which should logically be nested transactions of an outer transaction, there’s not going to be any help from a compiler or language to tell you about the inconsistency. Won’t you just have the same problem as locks, wherein programmers still need to know all the details of where to put their atomic blocks as opposed to their locking regions, and will typically punt to one-giant-transaction with similar serialization effects as one-giant-lock?

    How are you going to handle the I/O and general global mutable memory problem? Only transactional memory can be rolled back and retried. Once the bits have escaped I/O, there’s no getting them back.

    How are you going to deal with predictability of performance? A transaction could be stuck in a retry / abort / rollback loop for quite a while. Without seriously good visibility into this, we’re not going to be much better off.

    If people start having problems with their transactions being too large, how are you going to help them in indicating where they should be split up?

    How are you going to deal with the basic namespace / affinity problem? The problem is shared mutable state, blobs of data floating in the heap, usually. But atomic regions only help you with whatever transactional memory locations get accessed during the transaction; they don’t help the programmer reason clearly about the shared state.

    I’m interested in seeing what you present, but I do hope you don’t oversell developers on STM.

  4. Dana Groff says:

    Barrkel,

    Excellent questions; but more than I can fully answer in a quick reply to your comment.  We are planning on a series of postings which, I hope, do answer your questions fully.  Feel free to keep us honest!

    A recent Java One presentation by Azul nicely discussed how TM does not solve all race conditions.  Indeed, you can see race-conditions with database transactions as well if you rely on records not changing across transaction boundaries.  Transactions do not solve all problems, but they may ease some of the burdens you have today with locks.

    Composability is a key value proposition of transactional memory and we will go into more detail about composability in future posts.  But consider how difficult it is to update two arbitrary collections atomically.  How do you lock these collections?  In what order?  Do you just use one big lock to cover all collections?  TM provides fine-grained locking scalability without the worries and effort of coding with fine-grained locks.

    I am very aware of the hype that has surrounded STM in the past.  We have spent a lot of time working on problems like "the IO problem" and have multiple answers to that problem.  In fact, it’s a good example of a situation where there does not seem to be one answer to the problem, but needs to be addressed with a menu of alternatives.

    Other issues you mention like "transaction size" are classic problems with transactions that have the standard answer "it depends".  Specifically, it depends on your application — a long running transaction is "OK" if there is no contention but not "OK" if you don’t.  Our support of nested transactions with partial rollback may help a bit but the best solution, of course, is to reduce the contention and probably reduce transaction size.  BTW, this is the same answer to hot-spots in databases.  It seems like it is less of a technical problem for the runtime and more an educational one for the programmer.

    You also asking how to help programmers who are faced with large transactions and really what I read is that you are asking about profiling and tooling.  We have some ideas around profiling and MSR has done some work with tooling TM.  What do you think are the top three tools you would need to be successful with TM?

    Thank you for your comment!

    Dana

  5. Dmitriy V'jukov says:

    You a bit cheating here. The code must look not like:

    Lock (myCollectionLock) {

       Bool a = false;

       Bool b = false;

       try {

           myBigCollection.insert(someStuff);

           a = true;

           if (condition) {

               otherCollection.mutate (myBigCollection);

               b = true;

       }

       Catch (exception e) {

           If (a) myBigCollection.remove(someStuff);

           If (b) myStaticStuff.unmutate(otherCollection, myBigCollection);

           // I probably had to write my own unmutate

           Throw e;

       }

    } // lock

    but:

    Lock (myCollectionLock) {

       myBigCollection.insert(someStuff);

       if (condition) {

           try {

               otherCollection.mutate (myBigCollection);

           }

           Catch (exception e) {

               myBigCollection.remove(someStuff);

               Throw e;

           }

       }

    } // lock

    Not need for ‘unmutate’, not need for flags.

    And in C++CLI it will look just like:

    Lock l (myCollectionLock);

    myBigCollection.insert(someStuff);

    remover r (myBigCollection, someStuff);

    if (condition)

       otherCollection.mutate (myBigCollection);

    r.dismiss();

  6. Dana Groff says:

    If the "mutate" operation is atomic; yes you are correct.  I wasn’t assuming that it was atomic.  I was assuming that the mutation may have partially completed before the exception was thrown.

    Oh, and how do I guarantee that the mutation operation is atomic?  What happens if I didn’t write that code and its documentation doesn’t specify what happens in case of an error?

    Now add a third operation.  Or a loop that works over a collection or…  

    So, while this trivial example could be solved different ways, you can see that it is simply demonstrating the underlying complexity we see in many of today’s programs.  Which is that going from one "consistent" state to another "consistent" state may require more than one change.  If failures happen, you have to undo each of those changes.  You have to write code to do this (and it may not always be trivial), you have to test that code, and you have to maintain it.

  7. Dmitriy V'jukov says:

    Re: Oh, and how do I guarantee that the mutation operation is atomic?…

    It has nothing to do with multi-threading. If the operation is not atomic that it’s busted, then I am not able to just call it even in single-threaded program.

    Btw, it raises interesting question and possible usage for TM. Can I use TM in single-threaded program just to “rollback” changes in case of exception?

    I.e. let’s assume I have code “array.sort();”. Let’s assume sort() provides only basic exception safety, i.e. if exception is thrown by copy ctor of element, then array will stay in some intermediate state. Well, it seems that I can just wrap some complicated operation in atomic block to achieve strong exception safety: “atomic { array.sort(); }” – ok, now array will be either in initial state, or in final state, no intermediate states no more.

    Hmm… however it can be very expensive mean indeed, if we will consider size of readset and size of writeset. On the other hand, it’s extremely simple.

    I am also curious about implementation details of your STM. Do you have plans to publish some papers or just to describe details in blog? I mean things like eager vs lazy writes, or implementation of privatization safety, or usage of Bloom-filters/timestamps for validation etc.

  8. Dmitriy V'jukov says:

    Re: Btw, it raises interesting question and possible usage for TM

    I think a bit more on this.

    Do you realize that you CHANGE SINGLE THREADED SEMANTICS? Single threaded semantics must be preserved… at least by default… at least as long as you are proposing TM as replacement for locks.

    Approach taken by Intel STM Compiler team looks much more consistent in this context – they commit transaction in case of exception. I.e. single-threaded semantics are preserved.

    Actually I think that both behaviors are useful. I.e. there must be reasonable intuitive behavior by-default (commit on exception, just because it’s single-threaded semantics), and user can explicitly switch to “abort-on-exception” on a per-transaction basis (because it is required behavior in many cases, and also allows one to get strong exception safety w/o single line of code).

    What do you think?

  9. biu says:

    "Many of my coworkers argue quite persuasively that you should get rid of locks by decomposing the data such that there is no sharing.  They point to the use of “agents” and “messaging” as a good programming paradigm to ensure that data accessed by a specific thread is logically isolated from other threads.  This is a great method but can you re-architect your code to do this?  Can you use it over multiple data structures?  Can you compose them together?"

    It is possible to write message based parallel system, which share memory – you use pointers to shared memory in messages. The key here is, that you use smart executors, which ensure that no two task write (or write&read) the same critical portion of memory at the same time. So, you basically force the serial execution of the certain subset of the whole task set.

    It’s not easy (most likely equivalent to halting problem in general) to write such executors and it forces you to rethink the software completely, but it does scale and compose very well from my experience in writing network servers.

  10. biu says:

    (if it’s not clear from my previous post: no locks or semaphores are used anywhere except for the low level implementation of the general execution engine, serialization of tasks is forced by placing them in a single serial queue)

  11. Earlier today I saw this post about implementing locks with a timeout which claimed to have a thread