Performance Quiz #10 -- Thread local storage -- Solution
Article
I actually posted quiz #10 quite a while ago but a comment with the correct solution came in so quickly that I wasn't very motivated to post a followup. There are excellent links in the comments (thank you readers!) But now I'll have to make the quizzes harder :)
The problem was to see what overhead is associated with various methods of creating flexible thread local storage. I suggested two ways of having named storage.
I've posted a sample benchmark that expands on this and shows four different approaches (some less general than others).
On my machine I observed the following times:
Test1: Named Slot 7,991ms
Test2: Numbered Slot 4,136ms
Test3: Thread-local dictionary 2,006ms
Test4: Thread-local direct 704ms
So, what's going on? Well I looked into it with our profiler and got these results which show the extra costs pretty clearly. Have a look at all the helper functions under Test1 and Test2.
The table above is showing all functions starting from Main with an inclusive cost >= 1% and a depth of no more than 3 -- so things are missing but it's good for discussion. Under Test1 there's a good deal of Locking and Marshalling... looks like there is a big oops here. The good news is that the contract is sound so hopefully this could be addressed. But really I'm not sure why I would even bother. The other approach, using [ThreadStatic] is much cleaner and much faster. I don't know why anyone would ever want to use the slots.
For my part rather than fix this I think I will ask that the relevant functions be deprecated -- the [ThreadStatic] approach seems better in every way . The slot methods hereby have my personal deprecation for what that's worth.