Atomicity, volatility and immutability are different, part one

I get a fair number of questions about atomicity, volatility, thread safety, immutability and the like; the questions illustrate a lot of confusion on these topics. Let's take a step back and examine each of these ideas to see what the differences are between them.

First off, what do we mean by "atomic"? From the Greek ἄτομος, meaning "not divisible into smaller parts", an "atomic" operation is one which is always observed to be done or not done, but never halfway done. The C# specification clearly defines what operations are "atomic" in section 5.5. The atomic operations are reads and writes of variables of any reference type, or, effectively, any built-in value type that takes up four bytes or less, like int, short and so on. Reads and writes of variables of value types that take more than four bytes, like double, long and decimal, are not guaranteed to be atomic by the C# language.

What does it mean for a read and write of an int to be atomic?  Suppose you have static variables of type int. X is 2, Y is 1, Z is 0. Then on one thread we say:

Z = X;

and on another thread:

X = Y

Each thread does one read and one write. Each read and write is itself atomic. What is the value of Z? Without any synchronization, the threads will race. If the first thread wins then Z will be 2. If the second thread wins then Z will be 1. But Z will definitely be one of those two values, you just can't say which.

David Corbin asks in a comment to my previous entry whether immutable structs are guaranteed to be written atomically regardless of their size. The short answer is no; why would they be? Consider:

struct MyLong
{
public readonly int low;
public readonly int high;
public MyLong(low, high)
{
this.low = low;
this.high = high;
}
}

Ignore for the moment the evil that is public fields. Suppose we have a fields Q, R and S of type MyLong initialized to (0x01234567, 0x0BADF00D), (0x0DEDBEEF, 0x0B0B0B0B) and (0, 0),  respectively. On two threads we say:

S = Q;

and

Q = R;

We have two threads. Each thread does one read and one write, but the reads and writes are not atomic. They can be divided! This program is actually the same as if the two threads were:

S.low = Q.low;
S.high = Q.high;

and

Q.low = R.low;
Q.high = R.high;

Now, you can't do this because that's writing to a readonly field outside a constructor. But the CLR is the one enforcing that rule; it can break it! (We'll come back to this in the next episode; things are even weirder than you might think.) Value types are copied by value; that's why they're called value types. When copying a value type, the CLR doesn't call a constructor, it just moves the bytes over one atomic chunk at a time. In practice, maybe the jitter has special registers available that allow it to move bigger chunks around, but that's not a guarantee of the C# language. The C# language only goes so far as to guarantee that the chunks are not smaller than four bytes.

Now the threads can race such that perhaps first S.low = Q.low runs, then Q.low = R.low runs, then Q.high = R.high runs, and then S.high = Q.high runs, and hey, S is now (0x0DEDBEEF, 0x0BADF00D), even though that was neither of the original values. The values have been splinched, as Hermione Granger would say (were she a computer programmer).

(And of course, the ordering above is not guaranteed either. The CLR is permitted to copy the chunks over in any order it chooses; it could be copying high before low, for example.)

The name "MyLong" was of course no accident; in effect, a two-int readonly struct is how longs are implemented on 32 bit chips. Each operation on the long is done in two parts, on each 32 bit chunk. The same goes for doubles, the same goes for anything larger than 32 bits. If you try reading and writing longs or doubles in multiple threads on 32 bit operating systems without adding some sort of locking around it to make the operation atomic, your data are highly likely to get splinched.

The only operations that are guaranteed by the C# language to be atomic without some sort of help from a lock or other synchronization magic are those listed above: individual reads and writes of variables of the right size. In particular, operations like "increment" and "decrement" are not atomic. When you say

i++;

that's a syntactic sugar for "read i, increment the read value, write the incremented value back to i". The read and write operations are guaranteed to be atomic, but the entire operation is not; it consists of multiple atomic operations and therefore is not itself atomic. Two attempts to increment i on two different threads could interleave such that one of the increments is "lost".

There are many techniques for making non-atomic operations into atomic operations; the easiest is simply to wrap every access to the variable in question with a lock, so that it is never the case that two threads are messing with the variable at the same time. You can also use the Interlocked family of helper methods which provide atomic increment, atomic compare-and-exchange, and so on.

Have a lovely Memorial Day weekend, American readers. I'm spending my Memorial Day weekend marrying a close personal friend(*). Should be fun!

Next time: readonly inside a struct is the moral equivalent of cheque kiting, plus ways you can make the atomicity guarantees stronger or weaker.

(*) Actually, I am marrying *two* close personal friends. To each other, even!

Tags

1. Stuart says:

I'm scared of what this implies for Nullable<T>…

It implies that reads and writes of nullable struct variables are not guaranteed to be atomic. Why is that so scary? — Eric

2. Bill P. Godfrey says:

If you're going to marry someone, I would hope they are a close personal friend.

Ah, I understand. Never mind.

3. gabrielmaldi says:

X is 2, Y is 1, Z is 0. Then on one thread we say:

Z = X;

and on another thread:

X = Y

Each thread does one read and one write. Each read and write is itself atomic. What is the value of Z? Without any synchronization, the threads will race. If the first thread wins then Z will be 2. If the second thread wins then Z will be 1.

Am I wrong or Z will never be 1 given those conditions?

Cheers

By "the threads race" I mean you don't know which one runs first. If the second thread runs first and "wins the race" then X is 1. Then the second thread runs, and Z becomes 1. Is that now clear? — Eric

4. none@none.com says:

@gabrielmaldi

This is likely a typo.  If the second assignment is Z = Y, then Eric's statements continue to make sense.

5. Mike Caron says:

gabriel, imagine if

X = Y // X is now 1

ran before

Z = X  //Z is now 1

6. Shuggy says:

since, on a 64bit machine, references are 64bits long I wonder how c#,the language, guarantees that reads and writes are atomic.

We rely on the hardware. Do you have a machine in mind where reads and writes of pointer size are not atomic? — Eric

Obviously if the underlying platform has guaranteed it (for aligned accesses) then you're done, and perhaps all current platforms provide this, but how would you go about achieving this on a platform without this guarantee (do you have any?).

I'm not going to speculate about counterfactuals. — Eric

I can see why you need to do this for the references rather than everything 64bits long (it's a managed language after all) but for you (as in MSFT) to write the specification this way I assume you had a rationale for doing this in the event it cropped up (ahem)

We're assuming that no hardware manufacturer is going to be silly enough to make hardware with non-atomic reads and writes of pointer size. Is that not a reasonable assumption? — Eric

7. Andrey Titov says:

Is a wrong value the worst thing can happens with race writing? If I have a struct with a some object reference that is not 4-byte aligned because the struct starts with a single byte field, and I'm starting to write this struct simultaneously, can't I end up with invalid object reference that points at random memory location?

Reads and writes of four-byte variables are only guaranteed to be atomic if the variable is aligned. I'll be covering that in my next episode. — Eric

8. configurator says:

Shuggy: I think all platforms guarantee that register-size data is read and written atomically. And I think pointers are always register size or smaller. (I could be wrong on both statements, but I've never seen a system where either of them isn't true)

Andrey Titov: You definitely can, but you'd have to go through a hoop or two to make the struct unaligned. But don't forget, a wrong pointer and a wrong value is pretty much the same thing. Just imagine all the possible issues with this struct:

struct Slice<T> {
}

Eric: internally, how does Interlocked.Increment guarantee atomicity? Using compare/exchange or a simpler technique?

Magic! Since it does an atomic operation and generates a full memory fence, it is processor-specific how this actually works behind the scenes. — Eric

9. John says:

Bad food and ded beef?  Sounds like the burger I had last night.

10. gabrielmaldi says:

@Eric:

Thanks for your answer. The concept of "thread racing" is clear to me (or at least until you write one of your blog posts about it and knock down most of what I thought I knew, as you usually do hahaha).

What I meant is that you had (and still have) a typo there: "If the first thread wins then Z will be 2. If the second thread wins then Z will be 1". That should be "X will be 1".

Anyway, thanks for taking the time to read comments, and even answer them! I don't mean to be plaguy, I just think your posts are wonderful and try to help when I find something like this, for the benefit of the posterity hahaha.

—- No, there aren't any typos there. I must not be making myself clear. Let's start over. We have X=2, Y=1, Z=0. Then we have two threads. On thread one we execute Z=X. On thread two we execute X=Y. What are the possible final configurations?

Final configuration the first:

If the thread scheduler schedules thread one to run before thread two then we start by executing Z=X, so the state is now X=2, Y=1, Z=2. Then we run thread two, so we execute X=Y. The final state is X=1, Y=1, Z=2.

Final configuration the second:

If the thread scheduler schedules thread two to run before thread one then again we start with X=2, Y=1, Z=0, and thread two runs X=Y. The state is now X=1, Y=1, Z=0. Then thread one runs Z=X, and we have X=1, Y=1, Z=1.

Therefore the two possible final configurations are X=1, Y=1, Z=2 and  X=1, Y=1, Z=1. There is only one way that X and Y can possibly come out; they come out equal to one. But whether Z is 1 or 2 depends on the thread scheduler. That's what we mean by a "race condition".

Is that now clear? Can you explain why you thought there was a typo in there?

–Eric

11. > We're assuming that no hardware manufacturer is going to be silly enough to make hardware with non-atomic reads and writes of pointer size. Is that not a reasonable assumption?

If I understand how this works correctly, then a 16-bit CPU with 8-bit data bus, or 32-bit CPU with 16-bit memory bus, would not have atomic writes for machine word-sized values such as pointers. Such hardware existed in the past, and not as some exotic architecture – consider 8088 (16-bit CPU, 8-bit bus) and the widely popular 80386SX (32-bit CPU, 16-bit bus).

I don't think there are any popular platforms today with similar constraints, and it would be reasonable to expect the same from any future ones (except possibly for some DSPs and such).

12. Shuggy says:

@Pavel, yes, couldn't think of any that existed in a multi cpu compatible world. To people dealing with interrupt handlers this must have been problematic that's hardly an issue for anyone writing managed code specifications ðŸ™‚

13. Anthony P says:

@gabrielmaldi, no, it isn't a typo. Again, look at it. In one thread, you're saying Z = X; and the other X = Y; The question is "what is the value of Z?" The point is that you don't know if Z will be set to 2 or 1 because you don't know if X = Y happened before or after Z = X. Follow?

14. Gabe says:

The 8088 is an architecture with 32-bit addresses (16 bits for the segment plus 16 bits for the offset) and only an 8-bit data bus. To make an operation atomic on a single processor, just disable interrupts for the duration of the operation. To make an operation atomic across multiple processors (like on a MULTIBUS system), use the LOCK prefix to lock the bus so none of the other CPUs can access memory while you're executing [I believe] two consecutive instructions.

15. Shuggy says:

Hmm, my previous comment was lost in the bit bucket that is the msdn blog comment system:

quick paraphrase:

@Eric:

I asked specifically because of your phrasing: "The atomic operations are reads and writes of variables of any reference type, or, effectively, any built-in value type that takes up four bytes or less". From someone else I would treat the four byte thing as their being sloppy on the 64bit platform behaviour. but I would have expected you to say:

"The atomic operations are reads and writes of variables of any reference type, or, effectively, any built-in value type that takes up the same amount of space on the target platform when aligned correctly"

I also think this is somewhat strange be cause it could also be rephrased as:

""The atomic operations are reads and writes of variables of any reference type, or, effectively, any value type that takes up the same amount of space on the target platform when aligned correctly""

I am intrigued as to why you kept it to 4 bytes (whilst allowing references), and why you said built in (disavowing responsibility for properly aligned user defined value types)…

16. pete.d says:

@Shuggy:

He wrote it that way, because that's what the C# specification says. From section 5.5 of the C# spec:

5.5 Atomicity of variable references

Reads and writes of the following data types are atomic: bool,

char, byte, sbyte, short, ushort, uint, int, float, and reference types.

In addition, reads and writes of enum types with an underlying type

in the previous list are also atomic. Reads and writes of other types,

including long, ulong, double, and decimal, as well as user-defined

types, are not guaranteed to be atomic.

It is entirely possible that 64-bit operations on a 64-bit platform (e.g. assignments of variables of type "long") might be atomic. But you have no _guarantee_ of that, and the language and/or CLR is free to implement them such that they aren't. Without that guarantee, one is skating on thin ice writing code that assumes that behavior anyway.

17. gabrielmaldi says:

@Eric:

Yes, now it is clear. I think I just read too fast and didn't stop to analyse it carefully, as I should've done; and that's pretty much why I thought there was a typo there (reading other comments it looks like I wasn't the only one).

Thank you for making it clear!

18. Matthew Jones says:

Eric, out of curiosity… do you still have the lolgeek t-shirt that Cyrus made for you? ðŸ™‚

I sure do! — Eric

19. ferdinand swaters says:

Section 5.5  says what operations are atomic, but doesn't say what atomic is.

The spec is not a tutorial. If you're looking up the second on atomicity, the spec authors assume that you know what the word means and why it is important. I note that the spec also fails to define terms like "type", and rather assumes that the reader knows what is generally meant in a programming language by "type". The spec authors are not interested in writing a precise mathematical description of the language semantics. — Eric

Is atomic the A in ACID, or is it the I?

It's the "A". ACID stands for Atomicity, Consistency, Isolation and Durability. — Eric

20. Glad you added the Asterisk says:

It makes the whole marrying thing a whole lot clearer

21. It's interesting that C# does not guarantee atomicity of IntPtr and UIntPtr, while CLR does. I never actually noticed that mismatch before.

22. Shuggy says:

@Pavel, yes. It would be interesting to see what happened to a whole bunch of interop code if you took code compiled by the msc compiler, and ran it on a test 64bit CLR like environment that dealt with references properly but deliberately misaligned all other 64bit references that it could (without breaking normal single threaded code).

Might be feasible to hack mono to do that (but I would guess that there's tons of places that assume/enforce reasonable alignment).

23. Andrei Rinea says:

Eric, why is it 0x0DEDBEEF when it could be 0xDEADBEEF? Is it related to 0x0BADCAFE? ðŸ˜›

Anyway… 0x0B0B0B0B made me roll on the floor laughing… I just hope it was written by a woman.