Tricks you can play on yourself #789–Linq

I was profile some code this morning, and came across some interesting behavior.

Basically, we had some low level code that looked something like this:

IEnumerable<Guid> GetSpecialQuestionIds()
{
    return
      GetAllSpecialItems()
        .Select(specialItemXml => specialItemXml .CreateFromXml(questionXml))
        .SelectMany(specialItem => specialItem.Questions.Select(question => question.UniqueIdentifier)).Distinct();
        .Distinct();
}

So, it’s taking special item xml, deserializing each special item, and then grabbing all the unique question ids that are referenced by the special items. Perfectly reasonable code.

Elsewhere, we did the following (the original code was spread out over 3 different classes but I’ve simplified it for you):

var specialQuestionIds = GetSpecialQuestionIds();

foreach (Item item in items)
{
    var questions = item.Questions.Where(question => specialQuestionIds.Contains(question.UniqueIdentifier);
}

That also looks fine, but when I looked at the profile, I found that it was heavily dominated by the CreateFromXml() call. Well, actually, I did the profile first, and then looked at the code.

The problem is the call to Contains(). It will walk every entry in specialQuestionIds, which normally would be fine, but because it’s never been realized, it will deserialize all the special items… for every question in every item.

The fix is pretty simple – I changed GetSpecialQuestionIds() to call .ToList(), so that the deserialization only happened once, and the deserialization dropped from 65% down to 0.1% in the profile. And there was much rejoicing.

The lesson is that you should be careful whenever you return an IEnumerable<T> that isn’t super-cheap, because the person who gets it may enumerate it over and over.