Fast Switching with LINQ


Jomo Fisher—I’ve run into a performance problem several times in my career that I’ve never found a truly satisfying answer for until now. The problem can be distilled down to this:


Look up a value based on a string key. The set of strings is fixed and no unknown key values can occur.


Consider my personal rankings of the Seven Dwarves on a scale of 1-10:


                happy=9, sneezy=2, doc=7, sleepy=8, dopey=9, grumpy=2, bashful=6


I need a function that takes a name and returns a ranking. The default solution for this problem should be to use a Dictionary or HashTable. The implementations that ship with the .NET framework are extremely fast.


However, these container classes offer more features than I need for this problem. New keys can be added but my set is fixed. They can tell you whether a key is valid or not but I know a priori that the keys are limited to a fixed set. They can enumerate the set of keys or the set of values but I don’t need this capability.


Any time I see a data structure with a capability I’m not using it makes me wonder whether I can trade that capability for something I do need—in this case a speed boost.


If I was really desperate for a performance gain then it was possible to write a function like:


        static int Lookup(string s) {


            if (s.Length < 6) {


                if (s.Length < 5) return 7;


                else return 9;


            } else {


                if (s[0] < ‘s’) {


                    if (s[0] < ‘g’) return 6;


                    else return 2;


                } else {


                    if (s[1] < ‘n’) return 8;


                    else return 2;


                }


            }


        }


Notice that we only have to look at the length and the first two characters of the string to make the decision. This function is fast but nearly impossible to maintain or debug. Also, I have to know the set of strings at compile time. All of these drawbacks make this approach significantly less useful.


Enter the LINQ Expression Compiler that will be available in the .NET 3.5. With the new expression support it’s possible to write an expression at runtime and then compile this expression into straight IL (which will be jitted into machine code). It turned out to be pretty easy to write a switch compiler that would generate an appropriate lookup expression on the fly. I wrote a class called SwitchCompiler to do the heavy lifting. Here’s how it’s called:


            var dict = new Dictionary<string, int> {


                    {“happy”, 9}, {“sneezy”, 2},


                    {“doc”, 7}, {“sleepy”, 8},


                    {“dopey”, 9}, {“grumpy”, 2},


                    {“bashful”, 6}


                };


            var Lookup = SwitchCompiler.CreateSwitch(dict);


            var v = Lookup(“doc”);


            Debug.Assert(v == 7);


SwitchCompiler first builds up an expression that looks like this:


s.Length < 6 ? (s.Length < 5 ? 7 : 9) :


                (s[0] < ‘s’ ? (s[0] < ‘g’ ? 6 : 2) : (s[1] < ‘n’ ? 8 : 2));


And then compiles this into a delegate returned into Lookup. (It should also be possible to generate a perfect minimal hash in the Compile method. I haven’t tried this).


The actual implementation of SwitchCompiler is about 65 lines of code. You can see it in the attachment.


On my machine, looking up each value in a tight loop 1 million times takes 70ms while the corresponding loop using Dictionary takes 697ms. That’s close to a 900% performance improvement.


This posting is provided “AS IS” with no warranties, and confers no rights.

kick it on DotNetKicks.com

SwitchCompiler.cs

Comments (33)

  1. Jeff says:

    That’s pretty impressive performance for a VM-based language.  The C++ equivelent, 1 million iterations, runs in about 55 ms, on a Pentium 4, 3.4 Ghz.  Here’s the code:

    struct KeyValue {

     const char* key;

     int value;

     inline bool operator < (const KeyValue& that) const {

       return strcmp(this->key, that.key) < 0;

     }

    };

    static const KeyValue kvs[] = {

     {“bashful”, 6},

     {“doc”, 7},

     {“dopey”, 9},

     {“grumpy”, 2},

     {“happy”, 9},

     {“sleepy”, 8},

     {“sneezy”, 2},

    }, * const kvs_end = kvs + (sizeof(kvs) / sizeof(0[kvs]));

    inline int Lookup(const char* s) {

     KeyValue key = { s };

     return std::lower_bound(kvs, kvs_end, key)->value;

    }

    int _tmain(int argc, _TCHAR* argv[])

    {

     LARGE_INTEGER begin, end, frequency;

     QueryPerformanceCounter(&begin);

     const KeyValue* k = kvs;

     int total = 0;  // Just so the compiler doesn’t completely optimize out the for loop.

     for (int i = 0; i < 1000000; ++i) {

       total  += Lookup(k->key);

       if (++k == kvs_end)

         k = kvs;

     }

     QueryPerformanceCounter(&end);

     QueryPerformanceFrequency(&frequency);

     __int64 ms = (end.QuadPart – begin.QuadPart) * 1000 / frequency.QuadPart;

     cout << “Total:t” << total << endl;

     cout << “Milliseconds:t” << ms << endl;

    }

    One minor benefit of the C++ code is that there is no overhead of creating a lookup table before the first call to Lookup().

  2. Jomo Fisher says:

    Hi Jeff,

    I tried your code on my machine and amazingly got exactly 55ms. That makes your orange look a little more like my apple.

    Your comment made me realize my implementation has one more feature that I could possibly throw away: there’s an array bounds check on the string index operations. I think I could get rid of this by emitting ‘unsafe’ code. Unfortunately, I think I’d have to bypass the expression compiler to do this.

    I seem to be within spitting distance of optimized C++, that’s pretty cool considering how amazing the VS C++ optimizer is.

    Jomo

  3. Jomo Fisher says:

    Here’s my test code in case anyone wants to reproduce my result:

    /*

    Use of included script samples are subject to the terms specified at http://www.microsoft.com/resources/sharedsource/licensingbasics/permissivelicense.mspx

    Written by Jomo Fisher

    */

    using System;

    using System.Collections.Generic;

    using System.Linq;

    using System.Text;

    using System.Reflection;

    using FastSwitch;

    namespace ConsoleApplication1 {

       class Program {

           static void Main(string[] args) {

               var dict = new Dictionary<string, int> {

                       {"happy", 9}, {"sneezy", 2},

                       {"doc", 7}, {"sleepy", 8},

                       {"dopey", 9}, {"grumpy", 2},

                       {"bashful", 6}

                   };

               const int count = 1000000;

               DateTime start = DateTime.Now;

               for (int i = 0; i < count; ++i) {

                   var v = dict["happy"];

                   v = dict["sneezy"];

                   v = dict["doc"];

                   v = dict["sleepy"];

                   v = dict["dopey"];

                   v = dict["grumpy"];

                   v = dict["bashful"];

               }

               TimeSpan elapsed = DateTime.Now – start;

               Console.WriteLine("Dictionary: {0} ms", elapsed.TotalMilliseconds);

               var compiled = SwitchCompiler.CreateSwitch(dict);

               start = DateTime.Now;

               for (int i = 0; i < count; ++i) {

                   var v = compiled("doc");

                   System.Diagnostics.Debug.Assert(v == 7);

                   v = compiled("happy");

                   System.Diagnostics.Debug.Assert(v == 9);

                   v = compiled("sneezy");

                   System.Diagnostics.Debug.Assert(v == 2);

                   v = compiled("sleepy");

                   System.Diagnostics.Debug.Assert(v == 8);

                   v = compiled("dopey");

                   System.Diagnostics.Debug.Assert(v == 9);

                   v = compiled("grumpy");

                   System.Diagnostics.Debug.Assert(v == 2);

                   v = compiled("bashful");

                   System.Diagnostics.Debug.Assert(v == 6);

               }

               elapsed = DateTime.Now – start;

               Console.WriteLine("Compiled: {0} ms", elapsed.TotalMilliseconds);

           }

       }

    }

  4. You’ve been kicked (a good thing) – Trackback from DotNetKicks.com

  5. tom says:

    Looks like you want a perfect hash or a minimal perfect hash? See e.g. http://burtleburtle.net/bob/hash/perfect.html

    I don’t know about a c# implementation though.

  6. Justin says:

    The c++ code and c# code are timing 2 different things.

    the c++ code is cycling through each key for a total of 1,000,000 lookups.  the c# code is looking each key up 1,000,000 times for a total of 7,000,000 lookups.

  7. Jomo Fisher says:

    Justin,

    You’re right. I should have noticed the C++ code was doing 1/7th of the lookups as the C# code. When I adjust the C++ code, it dutifully reports 385ms which is pretty close to what the plain old .NET dictionary was doing.

    So I guess the LINQ algorithm is in the cat-bird seat, at least for a the time-being.

    I also noticed the C++ answer requires that the set of keys be sorted alphabetically before lookups can commence, so there *is* a little preparation time.

  8. Jomo Fisher says:

    Tom,

    I had considered a minimal perfect hash (and I had even looked at the exact page you sent me). I still think the LINQ algorithm should perform better in many cases because that perfect hashing algorithm is O(n) on the number of bytes in the input key. The LINQ algorithm is O(log N) on the number of bytes in the key. In particular, I think the LINQ algorithm will outpace the other when the size of the keys is large and the length of the keys is non-uniform.

    Of course the big-O notation can hide a potentially large constant factor, but I don’t really see why it would in this case here.

    Also, keep in mind that that particular minimal perfect hash implementation is throwing away a feature that I wanted to keep: It can’t easily accomodate key sets that are available only at run time.

  9. Jeff says:

    The C++ compiler optimized out the entire loop in my first version.  I added the total variable, and it still optimized out the whole loop.  I added the cout statement at the bottom and it finally executed the code.

    Any sign the C# compiler is optimizing out the whole loop in the C# code?

  10. Jomo Fisher says:

    The C# compiler is definitely not removing the loop–for one, it has no way to determine whether the delegate has side-effects that need to be preserved.

    This can be seen by increasing the number of iterations–2M loops run in twice the time as 1M.

  11. Jeff says:

    Wow, your switch compiler and the .Net VM is very impressive indeed.  My fastest run of the following C++ code was 88 ms.  Note also that I kind of cheated and used std::find instead of std::lower_bound, because when there are only 1 or 2 elements in the vector, std::find runs much faster.

    #include “stdafx.h”

    struct KeyValue {

     const char* key;

     int value;

    };

    struct CharValue {

     int char_;

     bool terminal_;

     union {

       int value_;

       size_t next_index_;

       vector<CharValue>* next_;

     };

     ptrdiff_t char_step_;

    };

    inline bool operator < (const CharValue& value, int c) {

     return value.char_ < c;

    }

    inline bool operator < (int c, const CharValue& value) {

     return c < value.char_;

    }

    inline bool operator == (const CharValue& value, int c) {

     return value.char_ == c;

    }

    static const KeyValue kvs[] = {

     {“bashful”, 6},

     {“doc”, 7},

     {“dopey”, 9},

     {“grumpy”, 2},

     {“happy”, 9},

     {“sleepy”, 8},

     {“sneezy”, 2},

    }, * const kvs_end = kvs + (sizeof(kvs) / sizeof(0[kvs]));

    class KnownSetTrie {

    public:

     typedef vector<CharValue> CharMap;

     vector<CharMap> tree_;

     ptrdiff_t first_step_;

     template <typename Iter>

     KnownSetTrie(Iter begin, Iter end) {

       first_step_ = Compile(begin, end, 0) – 1;

       // Now that the tree_ is stable, convert indices to node pointers.

       for(vector<CharMap>::iterator i = tree_.begin(); i != tree_.end(); ++i) {

         for (CharMap::iterator j = i->begin(); j != i->end(); ++j)

           if (!j->terminal_)

             j->next_ = &tree_[j->next_index_];

       }

     }

     template <typename Iter>

     ptrdiff_t Compile(Iter begin, Iter end, ptrdiff_t char_index) {

       CharMap map;

       Iter start = begin, i = begin + 1;

       while (true) {

         CharValue value;

         value.char_ = start->key[char_index];

         if (i != end && i->key[char_index] == start->key[char_index]) {

           do {

             ++i;

           } while (i != end && i->key[char_index] == start->key[char_index]);

           value.terminal_ = false;

           value.char_step_ = Compile(start, i, char_index + 1);

           value.next_index_ = tree_.size() – 1;

         } else {

           value.terminal_ = true;

           value.value_ = start->value;

         }

         map.push_back(value);

         if (i == end)

           break;

         start = i++;

       }

       if (1 == map.size() && !map.front().terminal_) {

         // Pass through node:

         return map.front().char_step_ + 1;

       }

       tree_.push_back(map);

       return 1;

     }

     inline int Lookup(const char* s) const {

       const CharMap* node = &tree_.back();

       s += first_step_;

       while (true) {

         const CharMap::const_iterator value = find(node->begin(), node->end(), *s);

         if (value->terminal_)

           return value->value_;

         s += value->char_step_;

         node = value->next_;

       }

     }

    };

    int _tmain(int argc, _TCHAR* argv[])

    {

     KnownSetTrie trie(kvs, kvs_end);

     LARGE_INTEGER begin, end, frequency;

     QueryPerformanceCounter(&begin);

     int total = 0;  // Just so the compiler doesn’t completely optimize out the for loop.

     for (int i = 0; i < 1000000; ++i) {

       for (const KeyValue* k = kvs; k != kvs_end; ++k)

         total  += trie.Lookup(k->key);

     }

     QueryPerformanceCounter(&end);

     QueryPerformanceFrequency(&frequency);

     __int64 ms = (end.QuadPart – begin.QuadPart) * 1000 / frequency.QuadPart;

     cout << “Total:t” << total << endl;

     cout << “Milliseconds:t” << ms << endl;

    }

  12. It seems the LINQ guys have done a great job of optimization inside of the LINQ namespace (System.Linq)….

  13. Jeff says:

    I got the C++ code down to 70 ms.  It took a __fastcall and some fun struct packing, though.

    I can’t wait to use C# .Net!

    #include “stdafx.h”

    struct KeyValue {

     const char* key;

     int value;

    };

    struct CharValue {

     int char_ : 8;

     int terminal_ : 1;

     int char_step_ : 23;

     union {

       int value_;

       size_t next_index_;

       vector<CharValue>* next_;

     };

    };

    inline bool operator < (const CharValue& value, int c) {

     return value.char_ < c;

    }

    inline bool operator < (int c, const CharValue& value) {

     return c < value.char_;

    }

    inline bool operator == (const CharValue& value, int c) {

     return value.char_ == c;

    }

    static const KeyValue kvs[] = {

     {“bashful”, 6},

     {“doc”, 7},

     {“dopey”, 9},

     {“grumpy”, 2},

     {“happy”, 9},

     {“sleepy”, 8},

     {“sneezy”, 2},

    }, * const kvs_end = kvs + (sizeof(kvs) / sizeof(0[kvs]));

    class KnownSetTrie {

    public:

     typedef vector<CharValue> CharMap;

     vector<CharMap> tree_;

     ptrdiff_t first_step_;

     CharMap* root_;

     template <typename Iter>

     KnownSetTrie(Iter begin, Iter end) {

       first_step_ = Compile(begin, end, 0) – 1;

       // Now that the tree_ is stable, convert indices to node pointers.

       for(vector<CharMap>::iterator i = tree_.begin(); i != tree_.end(); ++i) {

         for (CharMap::iterator j = i->begin(); j != i->end(); ++j)

           if (!j->terminal_)

             j->next_ = &tree_[j->next_index_];

         root_ = &tree_.back();

       }

     }

     template <typename Iter>

     ptrdiff_t Compile(Iter begin, Iter end, ptrdiff_t char_index) {

       CharMap map;

       Iter start = begin, i = begin + 1;

       while (true) {

         CharValue value;

         value.char_ = start->key[char_index];

         if (i != end && i->key[char_index] == start->key[char_index]) {

           do {

             ++i;

           } while (i != end && i->key[char_index] == start->key[char_index]);

           value.terminal_ = false;

           value.char_step_ = Compile(start, i, char_index + 1);

           value.next_index_ = tree_.size() – 1;

         } else {

           value.terminal_ = true;

           value.value_ = start->value;

         }

         map.push_back(value);

         if (i == end)

           break;

         start = i++;

       }

       if (1 == map.size() && !map.front().terminal_) {

         // Pass through node:

         return map.front().char_step_ + 1;

       }

       tree_.push_back(map);

       return 1;

     }

     int __fastcall Lookup(const char* s) const {

       const CharMap* node = root_;

       s += first_step_;

       while (true) {

         const CharMap::const_iterator value = find(node->begin(), node->end(), *s);

         if (value->terminal_)

           return value->value_;

         s += value->char_step_;

         node = value->next_;

       }

     }

    };

    int _tmain(int argc, _TCHAR* argv[])

    {

     KnownSetTrie trie(kvs, kvs_end);

     LARGE_INTEGER begin, end, frequency;

     QueryPerformanceCounter(&begin);

     int total = 0;  // Just so the compiler doesn’t completely optimize out the for loop.

     for (int i = 0; i < 1000000; ++i) {

       for (const KeyValue* k = kvs; k != kvs_end; ++k)

         total += trie.Lookup(k->key);

     }

     QueryPerformanceCounter(&end);

     QueryPerformanceFrequency(&frequency);

     __int64 ms = (end.QuadPart – begin.QuadPart) * 1000 / frequency.QuadPart;

     cout << “Total:t” << total << endl;

     cout << “Milliseconds:t” << ms << endl;

    }

  14. It seems the LINQ guys have done a great job of optimization inside of the LINQ namespace (System.Linq).

  15. Jomo Fisher says:

    Hey Jeff,

    That’s a really nice solution. I was wondering whether a trie could help here.

  16. lb says:

    Okay, this is insane. How could you rate grumpy and sneezy as a 2? Unless of course 1 is the best and 10 is the worst.

    This "SwitchCompiler.CreateSwitch" is absolutely amazing. It’s a best of both world’s sort of solution — we can focus on the high level problem, and have Linq provide optimisations that would otherwise be utterly unmaintainable and difficult to perform manually.

    Brilliant stuff. Thank you.

    lb

  17. Recently my time has been taken up with a series of internal issues involving Beta1, C# Samples and various

  18. Jeff says:

    A faster solution than a trie is a hand-built binary search tree, which is what I imagine case statement compiler does.  This code runs around 55 ms, and doesn’t cheat like my earlier code with __fastcalls or std::find() in place of std::lower_bound():

    Sorry for turning this into a language shoot-out, but this looked like such an interesting case to play with.  And again the .Net runtime performs astoundingly well.

    #include "stdafx.h"

    struct KeyValue {

     const char* key;

     int value;

    };

    static const KeyValue kvs[] = {

     {"bashful", 6},

     {"doc", 7},

     {"dopey", 9},

     {"grumpy", 2},

     {"happy", 9},

     {"sleepy", 8},

     {"sneezy", 2},

    }, * const kvs_end = kvs + (sizeof(kvs) / sizeof(0[kvs]));

    struct DTreeNode;

    struct DTreeStep {

     bool terminal;

     union {

       int retval;

       int char_step;

     };

     union {

       DTreeNode* next;

       size_t next_index;

     };

    };

    struct DTreeNode {

     char c;

     DTreeStep less_step, ge_step;

    };

    class DTree {

    public:

     vector<DTreeNode> nodes_;

     DTreeStep root_step_;

     template <typename Iter>

     DTree(Iter begin, Iter end) {

       root_step_ = Compile(begin, end, 0);

       // Convert indices into pointers.

       for (vector<DTreeNode>::iterator i = nodes_.begin(); i != nodes_.end(); ++i) {

         if (!i->less_step.terminal)

           i->less_step.next = &*(nodes_.begin() + i->less_step.next_index);

         if (!i->ge_step.terminal)

           i->ge_step.next = &*(nodes_.begin() + i->ge_step.next_index);

       }

       root_step_.next = &*(nodes_.begin() + root_step_.next_index);

     }

     template <typename Iter>

     DTreeStep Compile(Iter begin, Iter end, ptrdiff_t char_index) {

       ptrdiff_t count = end – begin;

       DTreeStep step;

       if (1 == count) {

         step.terminal = true;

         step.retval = begin->value;

         return step;

       }

       Iter middle = begin + (end – begin) / 2,

         middle_begin = middle, middle_end = middle + 1;

       while (true) {

         if (middle_begin == begin)

           break;

         Iter prior = middle_begin – 1;

         if (prior->key[char_index] == middle->key[char_index])

           middle_begin = prior;

         else

           break;

       }

       while (true) {

         if (middle_end == end)

           break;

         if (middle_end->key[char_index] == middle->key[char_index])

           ++middle_end;

         else

           break;

       }

       const ptrdiff_t less_count = middle_begin – begin,

         equal_count = middle_end – middle_begin,

         greater_count = end – middle_end;

       if (0 == less_count) {

         if (0 == greater_count) {

           // All middles, pass through

           step = Compile(begin, end, char_index + 1);

           step.char_step += 1;

           return step;

         } else {

           // middles and greaters

           middle_begin = middle_end;

         }

       }

       DTreeNode node;

       node.c = middle_end->key[char_index];

       node.less_step = Compile(begin, middle_begin, char_index);

       node.ge_step = Compile(middle_begin, end, char_index);

       step.terminal = false;

       step.char_step = 0;

       step.next_index = nodes_.size();

       nodes_.push_back(node);

       return step;

     }

     int Lookup(const char* s) const {

       const DTreeStep* step = &root_step_;

       while (true) {

         const DTreeNode* node = step->next;

         step = *s < node->c ? &node->ge_step : &node->less_step;

         if (step->terminal)

           return step->retval;

         s += step->char_step;

       }

     }

    };

    int _tmain(int argc, _TCHAR* argv[])

    {

     DTree dtree(kvs, kvs_end);

     LARGE_INTEGER begin, end, frequency;

     QueryPerformanceCounter(&begin);

     int total = 0;  // Just so the compiler doesn’t completely optimize out the for loop.

     for (int i = 0; i < 1000000; ++i) {

       for (const KeyValue* k = kvs; k != kvs_end; ++k)

         total += dtree.Lookup(k->key);

     }

     QueryPerformanceCounter(&end);

     QueryPerformanceFrequency(&frequency);

     __int64 ms = (end.QuadPart – begin.QuadPart) * 1000 / frequency.QuadPart;

     cout << "Total:t" << total << endl;

     cout << "Milliseconds:t" << ms << endl;

    }

  19. Zan says:

    ahh, I have no idea how you lot managed to get such results on 3.4 ghz machines for the standard dictionary tests.

    heres my code.. using standard generic dictionary and I get wihout fail 92-93ms lookup times for 1 million loops.

    I havent tried the linq version on my machine as I’m only running .net 2.0  but even comparing the results for linq here (even though this is distorted due to difference in machines used) the difference is a max of 20%..  nothing like the 900% claimed.

    by the way..  my tests were all run on a p4 2.4 ghz machine…   i’m sure using a cpu that clocks 30% faster will also reduce the performance gap between my results and those posted in the original statement.

    PS.  dont use datetime for monitoring performance as datetime is not updated every tick..  my experience is that datetime as a +/- difference of 30-40 ms on a good day vs the stopwatch class.

    anyway heres the code I run my tests with.

    Dictionary<string, int> index = new Dictionary<string, int>();

               index.Add(“happy”, 9);

               index.Add(“sneezy”, 2);

               index.Add(“doc”, 7);

               index.Add(“sleepy”, 8);

               index.Add(“dopey”, 9);

               index.Add(“grumpy”, 2);

               int found = 0;            

               Stopwatch sw = new Stopwatch();

               sw.Start();

               for (int i = 0; i < 1000000; i++)

                   found = index[“doc”];

               long elapsed = sw.ElapsedMilliseconds;

               sw.Stop();

               sw = null;

  20. Jomo Fisher says:

    Zan,

    Your test is only looking up one value in a loop. Our tests are looking up seven values in a loop. You can see the actual test code in the comments above. If you adjust your test to match you’ll get ~650ms.

  21. Gilles Michard says:

    I found that using KeyValuePair<string,TR> instead of Case<TR> allows to expose the function CreateSwitch to have a IEnumerable<KeyValuePair<string,TR>> parameter

    this allows these uses:

           public static Func<string, TR> CreateSwitch<TR>(this IEnumerable<string> strings, Func<string, TR> func)

           {

               return CreateSwitch(from s in strings select new KeyValuePair<string, TR>(s, func(s)));

           }

           public static Func<string, IEnumerable<TR>> EnumerableInvert<TR>(this IEnumerable<TR> values, Func<TR, string> name)

    I added the fllowing check in the function:

           public static Func<string, TR> CreateSwitch<TR>(this IEnumerable<KeyValuePair<string, TR>> caseDict)

           {

               if (caseDict.Count() != caseDict.Distinct().Count())

                   throw new ArgumentException(“duplicate names”);

               var cases = caseDict.ToArray();

               var p = Expression.Parameter(typeof(string), “p”);

               var expr = Expression.Lambda<Func<string, TR>>(

                   BuildStringLength( p, cases.ToOrderedArray(s => s.Key.Length), 0, cases.Length  – 1),

                   new ParameterExpression[] { p }

               );

               var del = expr.Compile();

               return del;

           }

           {

               return CreateSwitch(from v in values group v by name(v) into byname select new KeyValuePair<string, IEnumerable<TR>>(byname.Key, byname));

           }

           public static Func<string,TR> Invert<TR>(this IEnumerable<TR> values, Func<TR, string> name)

           {

               return CreateSwitch(from v in values group v by name(v) into byname select new KeyValuePair<string, TR>(byname.Key, byname.First()));

           }

           public static Func<string, TR> ParseEnum<TR>()

               //where TR: Enum

           {

               return CreateSwitch(from v in (TR[])Enum.GetValues(typeof(TR)).OfType<TR>() select new KeyValuePair<string, TR>(Enum.GetName(typeof(TR), v), v));

           }

  22. Rico Mariani has been posting about LINQ to SQL perfomance and has finally posted the performance numbers

  23. Rico Mariani has been posting about LINQ to SQL perfomance and has finally posted the performance numbers

  24. Jomo Fisher—A lot of people (myself included) have written about LINQ in the next version of C#. LINQ

  25. Raptor-75 says:

    Great stuff Jomo!

    Unfortunately it doesn’t work with this dictionary:

    {"Ado", i++},

    {"A2o", i++},

    {"A2i", i++},

    {"B2o", i++},

    {"AdoX", i++},

    {"AdPX", i++}

    It crashes with an IndexOutOfRangeException in BuildStringChar() because it gets a set of strings that are of different lengths. It seems to me that both your if statements that deal with middle are incorrect. If I replace middle with upper in those and in both calls to FirstDifferentDown() it almost works.

    Another problem I had to fix is that ToOrderedArray() orders the strings in a way that is not consistent with the Char comparisons in the resulting delegate. In this case it breaks down because ‘o’ < ‘P’ is false while "AdoX".CompareTo("AdPX") is -1. My solution was to make a new ToOrderedArray() that takes an IComparer<string> and passes it on to OrderBy(). I then pass it the Instance member from this class:

    internal class StringCharComparer : IComparer<string>

    {

       public static StringCharComparer Instance = new StringCharComparer();

       public int Compare(string x, string y)

       {

           for (int i = 0; i < Math.Min(x.Length, y.Length); i++)

           {

               if (x[i] > y[i])

                   return 1;

               else if (x[i] < y[i])

                   return -1;

           }

           if (x.Length == y.Length)

               return 0;

           else if (x.Length > y.Length)

               return 1;

           else

               return -1;

       }

    }

    By the way, it was pure luck that the dictionaries I tried exposed these bugs.

    Thoughts?

  26. Jomo Fisher says:

    Raptor, I think you’re right on both accounts. Thanks for pointing this out.

  27. Some One says:

    Nice, one thing I did was add the optimized Lookup into the testing for comparison. This is where one decides if the lookup is a dynamic list and if the dictionary will change over time. Even though the optimized Lookup is hard to maintain and debug that would be OK if it list was static and did not change over time. But if it does change then one has choices to build a Compiled Expression versus a conventional way. The new idea would be to take a conversional way of doing something and turning it into an Expression that could be compiled and may have performance gains. I like how you take something as simple as a lookup that one might think as being optimized but showing that by building an optimized expression the performance can be even faster.

    Dictionary: 700.07 ms  ~ 7.8x slower

    Compiled: 170.017 ms   ~ 1.89 – 2x slower

    Lookup: 90.009 ms

    Dictionary: 700.07 ms  ~

    Compiled: 170.017 ms   ~ 4.12x faster

    Thanx for the Expresion Building logic that helps whith how all various static methods in System.Linq.Expressions are used to build a dynamic function.

  28. Paul Westcott says:

    Nothing really to add of value, but kind of interesting. If you define the following class:

    class EqualityCompareStrings : IEqualityComparer<string>

    {

       public bool Equals(string x, string y) { return string.Equals(x, y); }

       public int GetHashCode(string obj) { return obj.GetHashCode(); }

    }

    And then create the dictionary with an instance of that class like

    Dictionary<string, int> dict = new Dictionary<string, int>(new EqualityCompareStrings());

    Then you get (from my test!) a time on the non-SwitchCompiler code of about 80%.

    So as I said nothing really to do with the anything much, although considering how often strings are used as keys it might be a trivial optimization to put into System.Collections.Generic.EqualityComparer<T>. CreateComparer() to handle it as a special case…

    Assuming that I haven’t done anything wrong 🙂

    (But then you might as well use the Jomo special SwitchCompiler – but if you want to keep using Dictionary<string, …>)

  29. Gustavo Guerra says:

    Hello

    I’ve also hit on the bug that Raptor-75 pointed out. Using his StringCharComparer dindn’t solve it all. If you have a complete fix can you please post it?

    Best Regards,

    Gustavo Guerra

  30. On my current project, we’re in a ‘make it faster’ mode right now.&#160; We’ve already knocked out a

  31. Code Beside says:

    StaticStringDictionary – Fast Switching with LINQ revisited

  32. Jomo Fisher—A while back I wrote about using LINQ to get close-to-the-metal performance in a particular