Extending the World

When people think of C# 3.0 and Linq, they commonly think of queries and databases.  The phenomenal work of the Linq to SQL guys provides ample reason to think of it this way; nevertheless, C# 3.0 and Linq are really much much more.  I have discussed a number of things that can be done with lambdas, expression trees, and queries and will continue to do so but I want to pause and discuss a little gem that is often overlooked in C# 3.0.  This new language feature has fundamentally changed both the way that I work in C# and my view of the world.  I've been using it a lot without ever drawing attention explicitly to it.  At least one reader noticed it and the possibilities it opens up and at least a couple of readers want an expanded version of it without even knowing it.

So what is the feature?  It's extension methods.

At first glance they don't look very special.  I mean really, all they are is one extra token in the definition of a static method inside of a static class.

static class Foo {

  public static Bar Baz(this Qux Quux) { ...

But as is usually the case, it's the semantics that are more interesting than the particular syntax.

The first argument of an extension method (the argument marked with this) is the implicit receiver of the method.  The extension method appears to be an instance method on the receiver but it is not.  Therefore, it cannot access private or protected members of the receiver.

For example, let's say that I detested the fact that the framework doesn't have a ToInt method defined on string.  Now, I can just provide my own:

public static int ToInt(this string s)
{
return int.Parse(s);
}

And I can then call it as:

"5".ToInt()

The compiler transforms the call into:

ToInt("5")

Notice how it turns it outside out.  So if I have three extension methods A, B, and C

x.A().B().C()

The calls get turned into

C(B(A(x)))

While all of this explains how extension methods work, it doesn't explain why they are so cool.

A few months back, I was reading various online content related to C# 3.0.  I wanted to get a feel for what customers were feeling and incorporate it as much as possible into the product.  In the process, I came across an interesting post, Why learning Haskell/Python makes you a worse programmer.  The author argues that learning a language like Python or Haskell can make things more difficult for you if your day job is programming in a language like C#.

I sympathize with what the author has to say and have had to spend enough time programming in languages that I didn't like that I think that I understand the pain.

That said, I hope that the author (and others who feel like him) will be pleasantly surprised by C# 3.0.  For example, let's look at his example of painful programming:

"I have a list of Foo objects, each having a Description() method that returns a string. I need to concatenate all the non-empty descriptions, inserting newlines between them."

In Python, he says that he would write:

"\n".join(foo.description() for foo in mylist if foo.description() != "")

In Haskell, his solution looks like:

concat $ List.intersperse "\n" $ filter (/= "") $ map description mylist

These both look like reasonable code and I rather like them.  Fortunately, you can express them in C# 3.0.  Here is the code that looks like the Python solution.

"\n".Join(from x in mylist where x.Description != "" select x.Description)

And here is the code that is closer to his Haskell solution:

mylist.Where(x => x.Description != "").Select(x => x.Description).Intersperse("\n").Concat();

At this point, some will protest that there is no Join instance method on string and there is no Intersperse defined on IEnumerable<T>.  And for that matter, how can you define a method on an interface in the first place?  Of course, extension methods are the answer to all of these questions.

public static string Join(this string s, IEnumerable<string> ss)
{
return string.Join(s, ss.ToArray());
}

public static IEnumerable<T> Intersperse<T>(this IEnumerable<T> sequence, T value)
{
bool first = true;
foreach (var item in sequence)
{
if (first)
first = false;
else
yield return value;
yield return item;
}
}

It is as if these methods were defined on the receiver to begin with.  At this point the realization sets in: a whole new mode of development has been opened up.

Typically for a given problem, a programmer is accustomed to building up a solution until it finally meets the requirements.  Now, it is possible to extend the world to meet the solution instead of solely just building up until we get to it.  That library doesn't provide what you need, just extend the library to meet your needs.

I find myself switching between the two modes frequently: building up some functionality here and extending some there.  In fact, these days I find that I often start with extension methods and then when certain patterns begin to emerge then I factor those into classes.

It also makes some interesting styles of programming easier.  I am sure it has some name, but since I don't know what it is I'll call it data interface programming.  First we declare an immutable interface that includes only data elements.

interface ICustomer
{
string Name { get; }
int ID { get; }
}

Then, we declare an inaccessible implementation of ICustomer that allows customers to be created through a factory that only exposes the immutable version.

class Factory
{
class Customer : ICustomer
{
public string Name { get; set; }
public int ID { get; set; }
}

  public static ICustomer CreateCustomer(int id, string name)
{
return new Customer { ID = id, Name = name };
}
}

Then we can declare behavior through extension methods.

public static string GetAlias(this ICustomer customer)
{
return customer.Name + customer.ID.ToString();
}

And finally, we can use the behavior.

var customer = Factory.CreateCustomer(4, "wes");
Console.WriteLine(customer.GetAlias());

All of this may seem like a round about way to declare an immutable abstract base class with various derived classes.  But there is a fundamental difference, the interface and behavior can change depending upon which extension methods are in scope.  So one part of the program or system can treat them one way and another can have an entirely different view of things.

Of course, what I really want to be able to do (and we don't do it yet) is something like:

var customer = new ICustomer { ID = 4, Name = "wes" };
Console.WriteLine(customer.GetAlias());

And then I skip the whole Factory thing all together.  The customer is immutable and the definition of the type is short and sweet.  All of the work of done by the compiler which incidentally doesn't need the factory because it can name mangle the implementation class and provide customized constructors automatically.  But I digress, hopefully we can do something like that in the future.

Of course extension methods don't make the traditional techniques inapplicable, they are still as useful as ever.  As with all design considerations, there are trade-offs involved.  Care must be taken to manage extension methods so that chaos doesn't ensue, but when they are used appropriately they are fantastically useful.

As I have been writing C# code, I have accumulated a library of useful extension methods and I encourage you to do the same thing so that the ideas that you think roll naturally off of your fingertips.