Extending the World


When people think of C# 3.0 and Linq, they commonly think of queries and databases.  The phenomenal work of the Linq to SQL guys provides ample reason to think of it this way; nevertheless, C# 3.0 and Linq are really much much more.  I have discussed a number of things that can be done with lambdas, expression trees, and queries and will continue to do so but I want to pause and discuss a little gem that is often overlooked in C# 3.0.  This new language feature has fundamentally changed both the way that I work in C# and my view of the world.  I’ve been using it a lot without ever drawing attention explicitly to it.  At least one reader noticed it and the possibilities it opens up and at least a couple of readers want an expanded version of it without even knowing it.

So what is the feature?  It’s extension methods.

At first glance they don’t look very special.  I mean really, all they are is one extra token in the definition of a static method inside of a static class.

static class Foo {

  public static Bar Baz(this Qux Quux) { …

But as is usually the case, it’s the semantics that are more interesting than the particular syntax.

The first argument of an extension method (the argument marked with this) is the implicit receiver of the method.  The extension method appears to be an instance method on the receiver but it is not.  Therefore, it cannot access private or protected members of the receiver.

For example, let’s say that I detested the fact that the framework doesn’t have a ToInt method defined on string.  Now, I can just provide my own:

public static int ToInt(this string s)
{
  return int.Parse(s);
}

And I can then call it as:

“5”.ToInt()

The compiler transforms the call into:

ToInt(“5”)

Notice how it turns it outside out.  So if I have three extension methods A, B, and C

x.A().B().C()

The calls get turned into

C(B(A(x)))

While all of this explains how extension methods work, it doesn’t explain why they are so cool.

A few months back, I was reading various online content related to C# 3.0.  I wanted to get a feel for what customers were feeling and incorporate it as much as possible into the product.  In the process, I came across an interesting post, Why learning Haskell/Python makes you a worse programmer.  The author argues that learning a language like Python or Haskell can make things more difficult for you if your day job is programming in a language like C#.

I sympathize with what the author has to say and have had to spend enough time programming in languages that I didn’t like that I think that I understand the pain.

That said, I hope that the author (and others who feel like him) will be pleasantly surprised by C# 3.0.  For example, let’s look at his example of painful programming:

“I have a list of Foo objects, each having a Description() method that returns a string. I need to concatenate all the non-empty descriptions, inserting newlines between them.”

In Python, he says that he would write:

“\n”.join(foo.description() for foo in mylist if foo.description() != “”)

In Haskell, his solution looks like:

concat $ List.intersperse “\n” $ filter (/= “”) $ map description mylist

These both look like reasonable code and I rather like them.  Fortunately, you can express them in C# 3.0.  Here is the code that looks like the Python solution.

“\n”.Join(from x in mylist where x.Description != “” select x.Description)

And here is the code that is closer to his Haskell solution:

mylist.Where(x => x.Description != “”).Select(x => x.Description).Intersperse(“\n”).Concat();

At this point, some will protest that there is no Join instance method on string and there is no Intersperse defined on IEnumerable<T>.  And for that matter, how can you define a method on an interface in the first place?  Of course, extension methods are the answer to all of these questions.

public static string Join(this string s, IEnumerable<string> ss)
{
  return string.Join(s, ss.ToArray());
}


public static IEnumerable<T> Intersperse<T>(this IEnumerable<T> sequence, T value)
{
  bool first = true;
  foreach (var item in sequence)
  {
    if (first)
      first = false;
    else
      yield return value;
    yield return item;
  }
}

It is as if these methods were defined on the receiver to begin with.  At this point the realization sets in: a whole new mode of development has been opened up.

Typically for a given problem, a programmer is accustomed to building up a solution until it finally meets the requirements.  Now, it is possible to extend the world to meet the solution instead of solely just building up until we get to it.  That library doesn’t provide what you need, just extend the library to meet your needs.

I find myself switching between the two modes frequently: building up some functionality here and extending some there.  In fact, these days I find that I often start with extension methods and then when certain patterns begin to emerge then I factor those into classes.

It also makes some interesting styles of programming easier.  I am sure it has some name, but since I don’t know what it is I’ll call it data interface programming.  First we declare an immutable interface that includes only data elements.

interface ICustomer
{
  string Name { get; }
  int ID { get; }
}

Then, we declare an inaccessible implementation of ICustomer that allows customers to be created through a factory that only exposes the immutable version.

class Factory
{
  class Customer : ICustomer
  {
    public string Name { get; set; }
    public int ID { get; set; }
  }

  public static ICustomer CreateCustomer(int id, string name)
  {
    return new Customer { ID = id, Name = name };
  }
}

Then we can declare behavior through extension methods.

public static string GetAlias(this ICustomer customer)
{
  return customer.Name + customer.ID.ToString();
}

And finally, we can use the behavior.

var customer = Factory.CreateCustomer(4, “wes”);
Console.WriteLine(customer.GetAlias());

All of this may seem like a round about way to declare an immutable abstract base class with various derived classes.  But there is a fundamental difference, the interface and behavior can change depending upon which extension methods are in scope.  So one part of the program or system can treat them one way and another can have an entirely different view of things.

Of course, what I really want to be able to do (and we don’t do it yet) is something like:

var customer = new ICustomer { ID = 4, Name = “wes” };
Console.WriteLine(customer.GetAlias());

And then I skip the whole Factory thing all together.  The customer is immutable and the definition of the type is short and sweet.  All of the work of done by the compiler which incidentally doesn’t need the factory because it can name mangle the implementation class and provide customized constructors automatically.  But I digress, hopefully we can do something like that in the future.

Of course extension methods don’t make the traditional techniques inapplicable, they are still as useful as ever.  As with all design considerations, there are trade-offs involved.  Care must be taken to manage extension methods so that chaos doesn’t ensue, but when they are used appropriately they are fantastically useful.

As I have been writing C# code, I have accumulated a library of useful extension methods and I encourage you to do the same thing so that the ideas that you think roll naturally off of your fingertips.

Comments (26)

  1. Jafar says:

    I think I’m missing the point.  How does this approach improve on traditional approaches to polymorphism.  An example would be helpful.

  2. Sushant Bhatia says:

    Awesome! You know what would be cool….a website where you can share extension methods.

    I bet you’d see a ton of stuff for math, image manipulations, string & regex.

  3. Derek says:

    Extension methods are really quite nice. It’s interesting that you used Join() as an example, since that’s the first extension method I wrote (and probably the one that’s proved most broadly useful thus far). Here’s a version that skips the array translation:

    static string Join<T>(this IEnumerable<T> value,

                         string separator,

                         Converter<T, string> converter) {

       StringBuilder joined = new StringBuilder(128);

       IEnumerator<T> enumerator = value.GetEnumerator();

       if (enumerator.MoveNext()) {

           for (;;) {

               joined.Append(converter(enumerator.Current));

               if (enumerator.MoveNext()) {

                   joined.Append(separator);

               } else {

                   break;

               }

           }

       }

       return joined.ToString();

    }

    This particular overload also contains a third parameter (second in extension method syntax) which I’ve found to really expand the utility of the method. For example, in Web projects you often have to encode data, e.g.:

    Response.Write(listOfStrings.Join(", ", HttpUtility.HtmlEncode));

    It’s also an efficient way of joining things of the wrong type:

    Response.Write(new int[] { 1, 2, 3 }.Join(", ", Convert.ToString));

    Or the _really_ wrong type (thanks lambda syntax!):

    Response.Write(GetUserObjects().Join(", ", u => HttpUtility.HtmlEncode(user.FullName));

    Here are a few other Ruby inspired methods:

    http://derekslager.com/blog/posts/2006/10/channeling-ruby-in-csharp-3.ashx

  4. wesdyer says:

    Jafar:

    Thank you for asking for clarification.  I don’t think that extension methods necessarily improve upon traditional approaches so much as they provide an alternative with different tradeoffs (another tool in the toolbox).

    So in that light, I can name a number of ways that they are very useful.

    1.  They are incredibly light weight to define compared with more traditional techniques

    2.  They allow classes to be extended after the fact even if the implementation is not accessible to the developer

    3.  They allow different parts of the same program to have different view of the behavior of various types

    4.  They allow behavior to be defined over interfaces

    There are other reasons too.  If I have time later on then I’ll post some explicit examples.

    Sushant:

    Long time no see.  You can post your code on http://www.codeplex.com

    Derek:

    I love your definition of Join and your post on extension methods.  Thank you for sharing it.

  5. Mark Grant says:

    Surely this also means that you can write more static side-effect free code but treat it like an instance method.

    Having originally started programming in imperative/ OO languages and moving rapidly towards functional programming I have recently come unstuck about whether to make methods instance methods or static.

    This answers the problem in one step – make them static – ergo side-effect free and stateless – and treat them like an instance method – more easily readable dot notation.

  6. Tom Kirby-Green says:

    Me too. The whole static vs. instance thing is causing me no end of pain as I try and fold (sic) the functional stuff into my OO background.

    I’d really appreciate any guidance from folks who’ve been using C# 3 long enough to have written some medium to large code bases in it already.

  7. wesdyer says:

    Mark:

    I agree.  Well put.

    Tom:

    That is a great question to ask and I hope that people can comment on that.  One piece of guidance that I can give is that I am sure that you are a wonderful software developer.  So I encourage you to try it out on less critical apps first and get your groove so to speak.  Figure out what works and what doesn’t for you.  I know that as I use C# 3.0 more and more, my style is constantly evolving.

    Besides what I have already said, one thing that is always a little strange (even with strategy pattern of whatever) is the expression of non-trivial algorithms in OO code.  I am finding that many times they fit more naturally as extension methods.  But I absolutely love them as a toolbox and as a quick prototyping way of working.

  8. rektide says:

    I’ve been a Boo fanatic for quite a while, its a python-inspired CLR language, and has had extension methods for a while.  I love them so much!!!  I have a stdlib of string and IEnumerable extensions (see: map) that I use everywhere.

    Of course, fsharp came out a little while latter, and they have almost all of the IEnumerables extensions I’d been so dilligently re-creating.

  9. damien morton says:

    Extension methods also allow for some other unusual programming styles. I read a paper called "First-class relationships in an object-oriented language", in whch the authors propose a new language that supports their thesis. Using extension methods, no such language is needed.

    See http://blog.lab49.com/?p=237 for more details.

  10. damien morton says:

    The one thing about extension methods that worries me is the packaging.

    They are static members of a class (possibly a static class), that exists in a namespace. Importing that namespace then imports all the extension methods in all the classes in the namespace. OK, so whats the role of the classes in which the extension methods are defined?

    Now, namepaces are pretty fuzzy things to start off with. They arent restricted to belonging to any particulat assembly, file or folder. Any class in any assembly can potentially belong to any namespace.

    The end result is that you have a very fuzzy way of packing up bundles of extension methods for use – one in which you never really know where you extension methods are coming from – maybe today they come from your known assemblies, but tomorrow they are coming from somewhere unexpected.

    Id rather have the classes containing extension methods have to be explicitly named as imported for extension – you cant have duplicate classes in a namespace, so at least you get some warning if there are collisions.

  11. Sadek Drobi says:

    -I fully agree about your use of extension methods, but i guess that as they give more flexibility to the developer, decipline is needed. One can go after extension methods, to find himself fast in a mess of packages and classes. Having said that, i pretty like your implementation, i think extension methods can be most handy for expressions builders, and fluent interfaces. i mean as most readers said, a side-effect free helper fuctions that can be used in fluent-interface style.

    – i really moaned a lot about anonymous classes, and i would really love to see them in c# 3.0 (i dont want to wait for another version).

    – I posted on my blog about extension methods (as a reflection of Martin Fowler’s Fluent Interfaces post)

    " OBSEV:: Fluent Interface and c# 3.0 Extension Methods : The flexibility of dynamic typing with the powerfull AutoCompletion "

    i guess it worth to be read http://sadekdrobi.com/?p=22

    – i like statically typed laguages, extension methods came to offer me some flexibility i really lacked before.

    by the way, it might be a good idea, to have something like a namespace interface, where we can switch  implementations for extension methods, kind of AOP :p . anyway thats an idea to suggest to the research guys 🙂

  12. wesdyer says:

    Damien:

    That is an awesome post.  I hadn’t thought of pulling cyclical dependencies out an using extension methods to manage them.  Very nice indeed.

    I agree about the packaging thing.  Use them with caution.  Possibly include each set of related extension methods in their own namespace.  We are thinking about things (post orcas) that would extend and improve the situation.

    Sadek:

    Interesting stuff.

  13. damien morton says:

    Post-Orcas it will no doubt be too late to fix the  extension-method packaging problem. Its a wart, and better to fix it now before its set in stone. Too many warts accumulating in c# as it is.

  14. Alex James says:

    Thanks for the link Wes.

    BTW I’ve tried walking a tree with extension methods and LINQ and wondered if you could see a better way… wouldn’t surprise me!

  15. Tom Kirby-Green says:

    Hi Wes,

    I’ve been playing around a lot with both C# 3 and C# 2, trying to push the whole functional thing as far as makes sense in each respective iteration of the langauge (I mean no ones suggesting throwing out the OO baby with the bath water right). In so doing I’m finding that in terms of supporting functional composition (or ‘pipelining’) it seems as if it’s may be better to trend towards returning an empty rather than null collection. So empty means ‘nothing’ and null is not used in favor of an exception being throw.

    Given that the cost of a managed new is on average considerably lower than an unmanged malloc I find I’m less nervous about writing code like this than I might be in say C++. Added to which these empty collections when spun up during a pipeline usually have very short life spans since they’re almost always either return values or parameters (occasionally locals), but not member variables.

    I was wondering what your thoughts where on the subject. Clearly like anything it could be taken too far – for example if some strange custom container had a computationally expensive default constructor (hard to imagine really in the general case).

    I’m aware that the empty vs. null debate is not a new one – I’m just interested to know if writing in a FP style favors one approach over another. Perhaps some of the other FP vets could offer up their experience on the subject?

    Kind regards,

    tom

  16. wesdyer says:

    Alex:

    I like it.  You’ll want to check out my Linq perf post when I finish it for some details that are related to your implementation.

    Tom:

    I agree.  Do not throw the OO baby out!

    I like the idea of returning empty collections especially since all empty collections are created equal (of a given type).  So you really don’t even need to new them up very often (though they are relatively cheap as you indicate).

    Personally, I really like removing null as much as possible (without doing it for its own sake).  So one class that I often use is IOptional<T> which is similar to Nullable<T> but for reference types.

  17. damien says:

    It would be nice if we could define static functions in a namespace without having to have a useless wrapping class.

    Perhaps every namespace should have a hidden static class that holds the free functions in that namespace. Imprting the namespace would be equivalent to importing the functions defined in that hidden class.

    You could still package up functions in a static class, but that static class would need to be imported explicitly for those functions to be available as free functions or extension methods.

    By free function, I mean a function that can be called without a qualifying class prepended.

  18. C# has been a class based language since the get-go so it’s hard to imagine it changing to the degree that we’d be able to write free functions in it.

    One alternative might be a ‘with’ (‘using’ is overloaded enough) style keyword that would open up a static class into the current scope and allow you to omit the ‘<ClassName>.’ from a function invocation. Of course you can alway point a Func<…> at a member function and then reference using just the variable name – which gets you closer to what you want today (i.e. it works in both C# 2 and 3).

  19. damien says:

    As long as c# had static methods, its class-based nature was merely a figleaf. I mean, Math.Sin(x) is merely more verbose than Sin(x) – no other benefit accrues.

    Perhaps the namespace scope resolution operator :: can be applied to the using directive, e.g.

    using System::Math;

    using System.Linq::Query.*;

    using Foo.Bar::Baz.Boink;

    The first directive would import the Math class from the System namespace. The second directive would import the methods of the Query class as free functions or extension methods. The third directive would import the Boink() method of the Baz class of the Foo.Bar namespace.

    Syntactic sugar. Removes nothing, adds precision and consiseness simultaneously.

    All greed? Ok, then, dev team, do your stuff.

  20. wesdyer says:

    Tom & Damien:

    We have considered it several times and it is certainly a possibility for the post Orcas timeframe.  I would love some way to do this.

  21. Jacob says:

    Why not just:

    public static IEnumerable<T> Intersperse<T>(this IEnumerable<T> sequence, T value)

    {

       yield return sequence.First();

       foreach (var item in sequence.Skip(1))

       {

           yield return value;

           yield return item;

       }

    }

    ?

  22. wesdyer says:

    That works very well except if sequence doesn’t contain anything.  Which can be solved if you add an if statement before yielding the first item.

  23. Last week the Microsoft MVP’s converged on Redmond from all corners of the globe. It was a great occaission

  24. EricWhite says:

    This is a good explanation of how to write an extension method. One thing: you can write the above code without intersperce and join, still in an FP style:

    mylist.Where(x => x.Description != "").Select(x => x.Description).Aggregate("", (s, i) => s + i + "n");

    If you don’t like all of the short lived string objects on the heap, this works too:

    mylist.Where(x => x.Description != "").Select(x => x.Description).Aggregate(new StringBuilder(), (s, i) => s.Append(i).Append("n"), s => s.ToString());

  25. David Júlio says:

    Estava a ler uma mensagem de mais um guru da Microsoft, o Wes Dyer . Nela, ele apresentava uma aplicação

  26. Anonymous says:

    This would be closer to the Haskell version:

    mylist.Select(x => x.Description).Where(d => d != "").Intersperse("n").Concat();