Introducing HashSet [Kim Hamilton]

HashSet<T> is in our latest CTP, and you can find it in the System.Collections.Generic namespace. The naming discussion over the last month has motivated me to recap some naming highlights for HashSet, so hang in til the end if you’re interested.

HashSet is an unordered collection containing unique elements. It has the standard collection operations Add, Remove, Contains, but since it uses a hash-based implementation, these operation are O(1). (As opposed to List<T> for example, which is O(n) for Contains and Remove.) HashSet also provides standard set operations such as union, intersection, and symmetric difference.

    HashSet<int> theSet1 = new HashSet<int>();    theSet1.Add(1);    theSet1.Add(2);    theSet1.Add(2);    // theSet1 contains 1,2    HashSet<int> theSet2 = new HashSet<int>();    theSet2.Add(1);    theSet2.Add(3);    theSet2.Add(4);    // theSet2 contains 1,3,4    theSet1.UnionWith(theSet2);    // theSet1 contains 1,2,3,4

HashSet’s default Add operation returns a bool letting you know whether the item was added, so in the code sample above, you could check the return type to check whether the item was already in the set.

    bool added = theSet1.Add(2); // added is true    added = theSet1.Add(2); // added is false

If you’re familiar with our ICollection<T> interface, notice that this means ICollection<T>.Add (returning void) has an explicit implementation, allowing HashSet<T> to introduce its own Add.

A note on uniqueness: HashSet determines equality according to the EqualityComparer you specify, or the default EqualityComparer for the type (if you didn’t specify). In the above example we didn’t specify an EqualityComparer so it will use the default for Int32. In the next example, we’ll use an OddEvenComparer, which considers items equal if they are both even or both odd.

    class OddEvenComparer : IEqualityComparer<int> {        public OddEvenComparer() {}        public bool Equals(int x, int y) {            return (x & 1) == (y & 1);        }        public int GetHashCode(int x) {            return (x & 1);        }    }    ...    // Now use the comparer    HashSet<int> oddEvenSet = new HashSet<int>(new OddEvenComparer());    oddEvenSet.Add(1);    oddEvenSet.Add(3);    oddEvenSet.Add(4);    // oddEventSet contains 1,4; it considered 1 and 3 equal.

Notice the name UnionWith in the first example. UnionWith, as with the other set operations, modifies the set it’s called on and doesn’t create a new set. This distinction is important because the Linq operations Union, Intersect, etc on IEnumerable create a new set. So HashSet’s methods aren’t duplicating Linq; they’re provided in case you want to avoid creating a new set, and they’re distinguished by the With suffix.

Now for some naming fun, which will demonstrate some other framework guidelines. We would have liked to name this feature Set. This is because it’s preferred to use a common name rather than one that reveals details about the implementation. To borrow an example from Krzysztof Cwalina and Brad Abram’s book Framework Design Guidelines, a type used to submit print jobs to a print queue should be named Printer, and not PrintQueue. Applying this guideline to this class – HashSet, while more technically precise, isn’t as recognizable at Set. You can see this guideline in other class names in the System.Collections.Generic namespace: List<T> instead of ArrayList<T>, Dictionary<T> instead of Hashtable<T>.

This brings up the question of whether naming it Set would have been a bad idea if we add other sets in the future, such as an OrderedSet. However, a hash-based unordered set can reasonably be considered the “go-to” set because of its good performance, so distinguishing it with the name Set would still be acceptable.

Any guesses as to why we didn’t go with the name Set?