LINQ to Objects and Buffer<T>


I was debugging a problem a couple of days ago, when I learned something interesting.

I had some code that looked like this:

foreach(Employee e in manager.Manages)
{
    …
}

The manager.Manages property returned a custom implementation of ICollection<Employee> that lazy-loads it’s data from somewhere when enumerated.

I wanted to change the code to display the manager’s reports sorted by firstname.

So I tried this:

foreach(Employee e in manager.Manages.OrderBy(r => r.Firstname))
{
    …
}

Pretty simple you would think.

Unfortunately, when I did this, it suddenly looked like the manager had no reports at all, which I knew was not the case.

Hmm…

So I started to delve into the implementation of the OrderBy(…) extension method. After some digging I found out that internally it uses a struct called Buffer<T>.

Why Buffer<T> is used is pretty obvious. In order for the OrderBy(…) method to do a sort it must have all the values in a buffer somewhere.

When I looked at the implementation of Buffer<T> in reflector I noticed that whenever the Buffer<T> class encounters an IEnumerable<T> that also implements ICollection<T> it has a little optimization, that uses Count and CopyTo(..)to initialize an array with the correct size and take a copy of the data, rather enumerating and re-sizing a target array as needed.

When I thought about this I realized that my little Lazy Loading Collection class would return a Count of 0 until it is enumerated. And because the count was 0, the collection would never actually get enumerated.

So when I used OrderBy(..) over my collection, I get no results.

Nasty!

Solution: I simply triggered a LazyLoad whenever Count is called too.

Moral of the Story: if you want your custom collection classes to play nicely with LINQ operators, you better return the correct Count!

Comments (3)

  1. fmarguerie says:

    My own moral of the story would have been: when your collection is based on delayed execution, use the IEnumerable<T> type for it (and not ICollection<T>).

    Don’t you think it’s better this way?

  2. Alex D James says:

    IEnumerable<T> would definitely make sense in some scenarios.

    But there are number of things that made that not an option.

    1) The collection wasn’t readonly.

    2) More importantly I wasn’t following the recognized semantics of ICollection<T>, and that is what caused the problem, if I had implemented ICollection<T>.Count as the documentation recommends (and as Buffer<T> correspondingly assumes) I wouldn’t have run into this problem.

    And finally a very weak one…

    If I have implemented IEnumerable<T> then this:

    from p in ctx.Person

    where p.Reports.Count > 10

    select p;

    wouldn’t work, although of course  

    from p in ctx.Person

    where p.Reports.Count() > 10

    select p;

    would. The point is subtle, but interesting because if you implement IQueryable<T> you have to implement translations not just for your classes but extension methods too…

    Alex