Collections and threadsafety (with some perf implications)


I’ve already made the design decision that my collections will not be inherently threadsafe.  If you want to use them you will need to provide your own locking mechanism to ensure that multiple threads don’t cause a problem.  However, i do believe that there should be no issues using my collections from a pure readers persepective.  I.e. no matter how many threads you have, as long as they do not change state it should be safe to share the collection.


However, it’s unclear to me how to mark mark something as having that invariant (or whether i should mark it at all).  C++ gives this to you with the const modifier on a method.  I.e. i could mark my Empty property “const” meaning that it was allowable to query an immutable version of that object for that property.  Unfortunately, i don’t know if I’m safe adding that invariant to the interface.  What if i have a collection that lazily initializes it’s fields.  Then calling Empty on it might actually affect state.  Should the interface place these sorts of invariants on the implementation?  Or should I leave that up the the implementing classes and tell consumers of the interface that all thread safety (including multiple readers) is up to them?


Another issue comes up regarding threads and performance.  Lets take a look at my implementation of ArrayCollection.Add again:


        public virtual bool Add(A element) {


            //From CLR 369 (Dynamic Tables)


            if (array.Length == 0) {


                array = new A[1];


            } 


 


            if (count == array.Length) {


                A[] doubledArray = new A[count * 2];


                Array.Copy(array, doubledArray, count);


                array = doubledArray;


            }


 


            array[count] = element;


            count++;


 


            //Adding an element always modifies an ArrayCollection


            return true;


        }


 


Here we can see that I access the field “count” 5 different times.  However, with each access the runtime is going to have to do a memory read in order to make sure it has the up to date count value in case another thread modified it.  However, i already said that I don’t care about my collection being thread safe.  So i would actually not want the runtime to do this check.  Instead, I would like the jitter to access the “count” field once and then use that value in all the places where “count” is read.  (similaryly i would like to do the same with “array“) I could do this in the following manner:


 


        public virtual bool Add(A element) {

            int _count = count;


 


            //From CLR 369 (Dynamic Tables)


            if (array.Length == 0) {


                array = new A[1];


            }


 


 


            if (_count == array.Length) {


                A[] doubledArray = new A[_count * 2];


                Array.Copy(array, doubledArray, _count);


                array = doubledArray;


            }


 


            array[_count] = element;


            count = _count + 1;


 


            //Adding an element always modifies an ArrayCollection


            return true;


        }


 


However, this clutters up the code immensely, adds more opportunity for errors, and would be an example of premature optimization since i have no data to indicate that accessing “count” is actually a costly operation.  However, it is something to consider in the future.


Comments (4)

  1. This "optimization" isn’t needed at all. As long as the count field isn’t volatile, the runtime is allowed to read the value only once and then reuse it. The JIT will probably be able do a better job without the extra local.

  2. Jereon, are you sure about this?

    What if my code called into some function that did change the value of count? How does the runtime know that the current thread is not going to modify the value ever?

  3. I’m sure, but you are, of course, correct that method calls will tend to make the JIT reload the value. Only if the method call is inlined entirely will it be able to see that the value of count isn’t changed. So if you call any methods and you really want to be sure that you get a consistent count value, you do need to use the additional local.

    To put it in another way: Reading a non-volatile field always sees stores from the current thread that precede it, but it doesn’t necessarily see any stores from other threads, unless you have had a memory barrier (either explicit or because of synchronization).

    Here is a good link on the .NET memory model: http://discuss.develop.com/archives/wa.exe?A2=ind0203B&L=DOTNET&P=R375

    (but note that there is a small error in Vance’s "broken" example that actually makes it correct, see http://discuss.develop.com/archives/wa.exe?A2=ind0203B&L=DOTNET&D=0&P=2294)

  4. Thanks Jeroen! That’s something great to know. I’ve found the runtime’s memory model to be quite confusing (Read nonintuitive) in the past, especially with the problems you can get on AMD64 with reordered read/writes. I’m glad to get some clarification on this point.