IEnumerable Distinct through a Func extension method

In a recent posting, available here, I tried to explain why the IEnumerable Distinct() method employs an IEqualityComparer implementation. In this posting I will provide a simple extension to IEnumerable for the Distinct() method that allows one to use a Func to determine a property on which the Distinct() method can operate.

The extension method approach provides a much simpler process for when distinct operations are needed on single property values.

Using the Product type outlined in the previous posting, this will allow the following Distinct() operation:

 IEnumerable<Product> distinctProducts =
    products.Distinct<Product, string>(prod => prod.UniqueIdentifier);

The UniqueIdentifier property is a read-only value that determines the product uniqueness:

 public string UniqueIdentifier
{
    get
    {
        if (this.ProductVersion == null)
        {
            return this.Reference;
        }
        else
        {
            return string.Format("{0}:{1}-{2:00}-{3:00}",
                this.Reference,
                this.ProductVersion.VersionName,
                this.ProductVersion.VersionNumber,
                this.ProductVersion.SubVersionNumber);
        }
    }
}

 

The complete implementation for this Distinct() extension method is as follows:

 public static class EnumerableExtensions
{
    public static IEnumerable<TSource> Distinct<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> getKey)
    {
        Dictionary<TKey, TSource> dictionary = new Dictionary<TKey, TSource>();
 
        foreach (TSource item in source)
        {
            TKey key = getKey(item);
            if (!dictionary.ContainsKey(key))
            {
                dictionary.Add(key, item);
            }
        }
 
        return dictionary.Select(item => item.Value);
    }
}

The Func in the Distinct() method calls provide a mapping from the source type, TSource, to the property type, TKey, on which the distinct comparisons will be performed.

In this implementation the determination of distinctness is made through the use of the Dictionary<TKey, TValue> generic class.

For each item in the Enumerable source a dictionary lookup is performed using the property value calculated in the Func expression. If no entry is found then the item is added to the dictionary, otherwise the duplicate item is skipped and the next item processed. Using this methodology, the dictionary will ultimately contain only distinct elements, accessible through the final Select() projection.

So how does the performance of the two options compare? As a comparison I executed 500,000 Distinct() method calls over a list containing 9 items with 3 duplicates. The IEqualityComparer approach took 2 seconds and the Func extension took 10 seconds; showing the hashing method is faster.

However if you just need to perform Distinct operations on single properties for small collections/lists, then the extension method is a much simpler programmatic approach; with reasonable performance.

Hopefully you will find this implementation useful.

Written by Carl Nolan