Performance Quiz #10 — Thread local storage — Solution


I actually posted quiz #10 quite a while ago but a comment with the correct solution came in so quickly that I wasn’t very motivated to post a followup.  There are excellent links in the comments (thank you readers!)  But now I’ll have to make the quizzes harder :)


The problem was to see what overhead is associated with various methods of creating flexible thread local storage.  I suggested two ways of having named storage.


I’ve posted a sample benchmark that expands on this and shows four different approaches (some less general than others). 


On my machine I observed the following times:


Test1: Named Slot 7,991ms
Test2: Numbered Slot 4,136ms
Test3: Thread-local dictionary 2,006ms
Test4: Thread-local direct 704ms

So, what’s going on?  Well I looked into it with our profiler and got these results which show the extra costs pretty clearly.  Have a look at all the helper functions under Test1 and Test2. 



















































































































Exclusive  Inclusive  Function Name 
0.39 % 89.92 %




Quiz10.Program.Main (string[])
0.78 % 53.07 %




   Quiz10.Program.Test1 ()
0.95 % 25.19 %




  |  System.LocalDataStoreMgr.GetNamedDataSlot (string)
0.18 % 12.14 %




  | |  JIT_MonReliableEnter (class Object *,bool *)
5.76 % 8.06 %




  | |  System.Collections.Hashtable.get_Item (object)
3.05 % 3.11 %




  | |  @JIT_MonExitWorker@4
3.49 % 22.31 %




  |  NativeArrayMarshalerBase::NativeArrayMarshalerBase (class CleanupWorkList *)
0.43 % 5.97 %




  | |  ThreadStore::LockDLSHash (void)
0.14 % 5.41 %




  | |  CantAllocThreads::MarkThread (void)
0.04 % 2.80 %




  | |  EEHashTableBase<int,class EEIntHashTableHelper,0>::FindItem (int)
0.77 % 2.19 %




  | |  FrameWithCookie<class HelperMethodFrame_1OBJ>::FrameWithCookie<class HelperMethodFrame_1OBJ> (void *,struct LazyMachState *,unsigned int,class Object * *)
0.78 % 1.59 %




  |  System.Threading.Thread.get_LocalDataStoreManager ()
0.16 % 1.22 %




  |  ThreadNative::GetDomainLocalStore (void)
0.57 % 1.16 %




  |  System.LocalDataStore.GetData (class System.LocalDataStoreSlot)
0.66 % 26.72 %




   Quiz10.Program.Test2 ()
3.73 % 21.79 %




  |  NativeArrayMarshalerBase::NativeArrayMarshalerBase (class CleanupWorkList *)
0.46 % 5.79 %




  | |  ThreadStore::LockDLSHash (void)
0.18 % 5.13 %




  | |  CantAllocThreads::MarkThread (void)
0.05 % 3.13 %




  | |  EEHashTableBase<int,class EEIntHashTableHelper,0>::FindItem (int)
0.57 % 1.62 %




  | |  FrameWithCookie<class HelperMethodFrame_1OBJ>::FrameWithCookie<class HelperMethodFrame_1OBJ> (void *,struct LazyMachState *,unsigned int,class Object * *)
0.11 % 1.19 %




  |  ThreadNative::GetDomainLocalStore (void)
0.44 % 1.08 %




  |  System.Threading.Thread.get_LocalDataStoreManager ()
0.53 % 1.05 %




  |  System.LocalDataStore.GetData (class System.LocalDataStoreSlot)
0.25 % 8.43 %




   Quiz10.Program.Test3 ()
0.55 % 7.07 %




  |  System.Collections.Generic.Dictionary`2.get_Item (!0)
2.38 % 6.52 %




  |    System.Collections.Generic.Dictionary`2.FindEntry (!0)
0.20 % 1.30 %




   Quiz10.Program.Test4 ()

The table above is showing all functions starting from Main with an inclusive cost >= 1% and a depth of no more than 3 — so things are missing but it’s good for discussion. Under Test1 there’s a good deal of Locking and Marshalling… looks like there is a big oops here. The good news is that the contract is sound so hopefully this could be addressed. But really I’m not sure why I would even bother.  The other approach, using [ThreadStatic] is much cleaner and much faster.  I don’t know why anyone would ever want to use the slots.


For my part rather than fix this I think I will ask that the relevant functions be deprecated — the [ThreadStatic] approach seems better in every wayThe slot methods hereby have my personal deprecation for what that’s worth.

Comments (6)

  1. jfo's coding says:

    If you’re stuffing anything in thread local storage, you might be interested in the performance comparison…

  2. Doug McClean says:

    Rico,

    Deprecation seems a bit harsh. It seems like some of the slot based methods could have application to dynamic languages and other interpreters especially(?).

  3. ricom says:

    Seriously I can’t think of any cases where it wouldn’t be better to just make your own personal Dictionary to hang on each thread.  Such a thing is still discoverable in a dynamic language if you wish it to be.

    Poking into other classes named slots — which may have been intended to be ‘private’ seems unwise at best.

    So I think to myself, why have these methods at all?

    I wouldn’t worry though, when it comes to deprecation I don’t usually get what I ask for :)

  4. Ever wonder how I get those nice looking HTML call trees with attributed costs like this one here&amp;nbsp;in…

  5. Ever wonder how I get those nice looking HTML call trees with attributed costs like this one here in