Fun with Generics in Whidbey


I thought I’d share a little of my “app building” fun with you folks.  Just for giggles I wrote a command line tool that lists the most frequent commentors on CLR related blogs.    Most of the code is screen scraping and not that interesting (I understand that Scott is working on getting some great new web services to make this easy as pie).  But there is a bit that uses the new generic collections that I thought would be interesting to share with you.  


 


All of this is on a current Whidbey build, so no promises about this working on the PDC bits.


 


Part 1: The tally.  In this section of code I am keeping a count of how many times each person has commented.  I love the strong typing generics gives me, notice how pretty this line looks:


tally[name]++;


It is unfortunate that I have to have a special case for the first time an entry is added, but I think that makes sense for a general purpose collection such as Dictionary.  


 


Dictionary<string, int> tally = new Dictionary<string, int>();


List<Uri> entries = GetEntryUris(DateTime.Parse(“2/01/2004”), DateTime.Now);


foreach (Uri entry in entries)


{


      foreach (string name in GetComments(entry))


      {


            if (tally.ContainsKey(name))


            {


                  tally[name]++;


            }


            else


            {


                  tally.Add(name, 1);


            }


      }


}


 


Part 2: The sort.  I needed to do something kind of odd – sort the dictionary by its values rather than its keys.  I do this to get a list in the order of most frequent commentor to least frequent commentor.  Not something even SortedDictionary can support.  I considered writing my own collection for this, but given that I wanted to dogfood our stuff I elected to copy the items into a List<T>. It is a O(n) thing, but fine to do once before I output the results.


I then use one of the new “functional” members on List<T> that works nicely with C#’s anonymous methods.  I am sure we could have some good debates source code formatting for this one.  


I took a look at the IL for this section – very interesting, maybe something we can get Eric or his comrades to blog about?  


     


List<KeyValuePair<string, int>> l = new List<KeyValuePair<string, int>>(tally);


l.Sort(


   delegate(KeyValuePair<string, int> obj1, KeyValuePair<string, int> obj2)


{


            return obj2.Value.CompareTo(obj1.Value);


      }


);


 


 


Part 3: The report.  The report itself is very easy.   


foreach (KeyValuePair<string, int> k in l)


{


      Console.WriteLine(“{0}={1}”, k.Key, k.Value);


}


 


As a side note, I predict that usage of C#’s using statement for compile-time type aliasing will dramatically increase as a result of generics usage like what I describe above.  Consider how much cleaner this code is and it compiles to the exact same IL:


 


using NameCount = System.Collections.Generic.KeyValuePair<string, int>;



List<NameCount> l = new List<NameCount>(tally);


l.Sort(delegate(NameCount obj1,NameCount obj2)


{


      return obj2.Value.CompareTo(obj1.Value);


});


foreach (NameCount k in l)


{


      Console.WriteLine(“{0}={1}”, k.Key, k.Value);


}


 


 


 

Comments (15)

  1. Steven Smith says:

    Brad,

    Is that last bit about the ‘using’ command new in Whidbey? Isn’t ‘using’ as a keyword already overloaded enough in C# with:

    using System.Drawing;

    // and

    using (Image myImage)

    {

    // do stuff with myImage here and then it’s disposed

    }

    I know this double-usage of the keyword confuses many people first learning the language – I’d hate to see *yet another* way of using ‘using’ (or, if it’s already there, I hate that instead…).

  2. Dan Crevier says:

    C++’s std::map will actually let you do tally[name]++ without the special case for it not existing. It inserts it into the map if it wasn’t already there.

  3. Wilco B. says:

    Steven: you can already use the ‘using’ keyword like Brad described in the current versions of .NET:

    using MyClass = System.Windows.Forms.Form;

    and

    using MyNamespace = System.Windows.Forms;

    etc. So it’s not a new thing in Whidbey.

  4. DuncS says:

    Could this:

    using NameCount = System.Collections.Generic.KeyValuePair<string, int>;



    List<NameCount> l = new List<NameCount>(tally);

    be changed to this?

    using NameCountList = List<KeyValuePair<string, int>>;



    NameCountList l = new NameCountList(tally);

  5. Ken Brubaker says:

    Kinda disappointed to see single character variables in Brad Abram’s code. I know there is no specification for private and local variables but bad habits…

  6. Brad Abrams says:

    Ken, there is a reason why my guidelines only focus on PUBLICLY EXPOSED object models 😉

  7. Joe Duffy says:

    Holy C++.

    😉

  8. Mike Dunn says:

    List<KeyValuePair<string, int>>

    Not having Whidbey I can’t test this, but does the compiler parse ">>" according to context, so it doesn’t get mistaken for the right-shift operator?

  9. Reza A says:

    Wow, I love the anonymous method part in your code. Do we have Anonymous Class in Whidbey, or we should still use the nested class?

  10. Mike Dimmick says:

    So, am I going to be the only one who asks for the results? 😉

  11. damien morton says:

    With these anonymous methods, if the compiler can infer what kind of delegate is required, why cant it also infer the parameters and their names:

    l.Sort(delegate(NameCount obj1,NameCount obj2) { return obj2.Value.CompareTo(obj1.Value); });

    I dont know the exact detail of this new Sort method, but clearly it takes a delegate of the approximate form:

    delegate int CompareDelegate<T>(T obj1, T obj2);

    wouldnt it be possible to infer that the anonymous function takes two arguments, that they are both of type T, and that their names are obj1 and obj2:

    l.Sort(delegate{return obj2.Value.CompareTo(obj1.Value)});

    Youd only need to use the long form is there was any ambiguity, or if you wanted to name the paramaters differently.

  12. Damien,

    Brad asked me to answer this.

    The compiler could infer the parameters and names, and since you have to look at the delegate anyway, it would be easy to know the types initially.

    When you come back to read the code again, however, it would be harder to know what was going on.

    On the ‘long form / short form’ question, we generally try to avoid that kind of construct, as it makes the language more complex. Users have to learn another rule.

    Hope that makes sense,

    Eric

  13. damien morton says:

    Thanks for the response eric. I do understand the extra verbosity, I just figure that, if youre going to start infering things (such as delegate types), why not go the whole hog. Then again, I do take your point about this being a potential difficulty when it comes time to figure out what the hell is going on in some code. Thanks again for the response.

  14. Lionel Zhang says:

    A question about the using directive (I think this is C#’s version of typedef) in the last part: if the type ‘NameCount’ is incorrectly used, will the complier say

    error CS1234: something wrong with ‘NameCount’

    or

    error CS1234: something wrong with ‘System.Collections.Generic.KeyValuePair<string, int>’?

    We’ve seen too many of such long, incomprehensible error messages in C++, especially when one combines templates to make complex types.