Small in, Big out


I’ve come across some code that is using – and overusing – IEnumerable<T>, and thought it might be of general interest.

Consider the following method signatures:

IEnumerable<string> Process(List<string> input);

IList<string> Process(IEnumerable<string> input);

Of the two, which is the better choice? Write down your answer.

There are really two questions – what is the best choice for input parameters, and what is the best choice for return values. I’ll cover them separately:

Small In

The input parameter should be the smallest (ie least functional) type that allows the method to efficiently do what it needs to do. I’ve seen methods like this:

void Process(List<string> input)
{
    foreach (string in input)
    {
        …
    }
}

(or maybe the Linq equivalent).

In this case, you are just making things harder for the caller; all you really need is an IEnumerable<string>, so that’s what you should specify. On the other hand, I’ve also seen code like this:

void Process(IEnumerable<string> items, IEnumerable<ReferenceItem> referenceItems)
{
    var lookup = referenceItems.ToDictionary(referenceItem => referenceItem.Name, referenceItem => referenceItem.Contents)’
    …
}

This code takes a simple enumerable, and constructs a dictionary out of it. This is worse than the first case; if you need a dictionary, ask for a dictionary.

Big Out

Output parameters should be the biggest (ie most functional) type that is acceptable for the scenario. I’ve seen methods like this:

IEnumerable<string> Process(List<string> input)
{
    List<string> items = new List<string>();

    // fill list here

    return items;
}

The developer has created a beautiful, functional list object, but they’re only going to let the caller enumerate over it. This often leads to my favorite outcome, where the caller writes something like:

List<string> items = new List<string>(myClass,Process(input));

Don’t do that. Just return the List<string> or perhaps the IList<string>.

Return values aren’t as cut-and-dried as input parameters, however. If we modify the previous example as follows:

IList<string> Process(List<string> input)
{
    m_items = new List<string>();

    // fill list here

    return m_items;

In this example, the class is retaining ownership of the list, and in that case, it probably doesn’t want to give it out to somebody who could clear it, or replace all the strings with the string “Haddock”. If the class is going to retain ownership, it should use a return type that prevents bad things like that from happening.


Comments (10)

  1. configurator says:

    That example:

    IEnumerable<string> Process(List<string> input)  {

       List<string> items = new List<string>();

       // fill list here

       return items;

    }

    Should often be replaced with code that never even builds the list in the first place, i.e. using yield return. That's the best reason to use IEnumerable<T> everywhere (in input and output parameters) I've found yet.

  2. runefs says:

    It's not the return type that's important in your last example but what's being returned. Changing the return type to `IEnumerable<string>` won't make it anymore safe but changing the return statement to m_items.ToList() without changing the return type would

  3. Meta-Knight says:

    Thank you for this helpful advice.

  4. JB says:

    Thanks for sharing this!  It is one item I've been pointing out in our internal code reviews in recent months.  We definitely want to make our method signatures as useful and friendly to the calling applications as we can.

  5. Bruce Wayne says:

    Couldnt agree more – I've seen ILists being returned as IEnumerables more and more recently, and it's a disturbing trend that could only have come from some purist madman. To add another argument to yours: IEnumerable is lazy – there's no way the consumer of the api can know a) what sort of performance hit each enumeration will yield (does it go over the wire too?) and b) how long it is. The simply cant count, or index into it, without assuming it'll be iterating madly. And 9 times out if 10, it isn't – it's actually an IList under the covers, and people simply .ToList() it to make it more useful again as soon as they've got a result. Better to return it as an IList and keep IEnumerable for its most common assumed purpose in APIs (eg. linq) – lazy evaluation.

  6. Bruce Wayne says:

    Some people will argue that IList has Add, Remove, Insert methods – and I agree in sorts there's space for an immutable list interface which only includes .Count and an indexer, however this is not a good reason to fall back to IEnumerable for the previous reasons. We have ReadOnlyCollection and IList.IsReadOnly for immutable collections for now.

  7. ericgu says:

    configurator – If it is convenient to writen an enumerator, sure, I would not recommend creating the list. But some operations naturally result in the whole list.

    runefs – Sure, you are correct; somebody could case the IEnumerable<string> back to List<string>. The question is whether I care. If it's a component I ship externally and/or there are security implications, then I might care, but in the majority of cases I don't. If somebody does that, they get what they get, just as they would if they used reflection to call internal methods. Calling ToList() in situations where it doesn't matter doesn't make sense to me.

  8. Bruce Wayne says:

    …. ah yeah, @configurators post reminds me – one last thing on this. I return IEnumerable<T> a fair bit too, where appropriate – the yield pattern is a great example of this

  9. Marc says:

    I've never encountered a for List<> with 2 parameters as you have in your example: new List<string>(myClass,Process(input));

    What does that do?

  10. ericgu says:

    Should be (myClass.Process(input))