Why all the love for lists?

Article
02/12/2007

One of the things that I have noticed when participating in interviews with potential candidates is that most candidates have a favorite data structure. Presumably this is the data structure that they feel the most comfortable with. The problem is that no matter what coding question I ask them, they see the answer through the lenses of their favorite data structure. I have met programmers who are in love with hash tables, others who have a certain fondness for B-trees, and quite a few who adore arrays. My feeling is that this happens not because they understand the data structure itself better than other data structures (honestly, I can't believe that the guy who loves b-trees doesn't understand what arrays are), but it is because each data structure encourages a certain way of thinking about problems. So really the candidates are just thinking about the question in a certain way that is expressed on the outside as using some particular data structure.

One of the first things that a newcomer to functional programming will notice is that lists are all the rage. Where most programmers who are accustomed to imperative style would naturally use an array, a variable, or a mutable object, a functional programmer will often use a list. Why is this?

If we consider the insight that it is not the data structure itself but rather the mode of thinking that encourages the usage of the data structure then we can make some progress on understanding this. Lists are the simplest recursive data structure. A list is just a node consisting of a value and the next node in the list. The definition of a list is defined in terms of itself.

interface IList<T>
{
IList<T> Next { get; }
T Value { get; }
}

But notice how similar this way of thinking is to recursive functions. For example a very common function defined over lists is Length. This function computes the length of a given list.

public static int Length<T>(this IList<T> list)
{
if (list == null)
return 0;
return 1 + Length(list.Next);
}

Length is a function that is defined in terms of itself. Like the definition of list, Length has a value (either the 0 or the 1 +) and it has a next (the recursive call to Length(list.Next)). So one reason why lists are so popular in functional programming is that they encourage a recursive way of thinking about problems.

In C# 3.0, lists are not so commonly used as IEnumerable<T>. In fact, IEnumerable<T> is used so much that it sticks out in the same way the usage of lists in functional programming languages sticks out. The primary Linq provider that will be shipped is Linq to Objects which requires that the source of a query implement IEnumerable<T>.

In some senses, IEnumerable<T> is not so very different from the definition of IList<T> shown above. Both of them represent sequences of elements where only the current value and the next element are available at a given time. In fact, it is relatively easy to define one in terms of the other.

Here is a definition of IEnumerable<T> in terms of IList<T>.

public static IEnumerable<T> GetEnumerator<T>(this IList<T> list)
{
for (var current = list; current != null; current = current.Next)
yield return current.Value;
}

And here is a corresponding definition of IList<T> in terms of IEnumerable<T> (take from this post).

class LazyList<T> : IList<T>
{
IList<T> next;
T value;
IEnumerator<T> enumerator;

LazyList(T value, IEnumerator<T> enumerator)
{
this.value = value;
this.enumerator = enumerator;
}

public static IList<T> Create(IEnumerable<T> enumerable)
{
return Create(enumerable.GetEnumerator());
}

public static IList<T> Create(IEnumerator<T> enumerator)
{
if (enumerator.MoveNext())
return new LazyList<T>(enumerator.Current, enumerator);
return null;
}

public IList<T> Next
{
get
{
if (enumerator != null)
{
next = Create(enumerator);
enumerator = null;
}
return next;
}
}

public T Value { get { return value; } }
}

But it should be noted that there is a critical difference between both of these definitions and the following definition of List<T>.

class List<T> : IList<T>
{
T value;
IList<T> next;

public List(T value, IList<T> next)
{
this.value = value;
this.next = next;
}

public IList<T> Next { get { return next; } }
public T Value { get { return value; } }

public static IList<T> Empty { get { return null; } }
}

Did you notice what it is? The previous definitions were lazy but this definition is not. The iterator did not describe the data itself but rather a process for sequencing data that will be executed possibly sometime in the future. The LazyList also does not actually realize an entire list. It describes a way to construct the next element of a list on demand. But the definition of List does not defer anything to the future. Now the question is, does it matter?

Why all the love for lists?

Additional resources