Finding matches in unsorted collections (or: how to find pairs of socks efficiently)

Article
07/03/2005

I was sorting through my clothes after taking them out of the drier today, and I realized that over time I've naturally come up with a strategy to optimize my work.

The problem is as follows: given an unordered collection of elements (socks), find the pairs matched by a key (sock color / model). The first phase is similar to hashing: as I take an element (sock) out of the initial collection (basket), I move it to a smaller heap (in my case: black dress socks, brown dress socks, and sports socks). Determining which heap (pile) a sock belongs to is very cheap, and breaks down a big problem into smaller problems.

This is important because, unfortunately, the next phase of the algorithm grows pretty darn fast as the size of the elements grow. Because the elements don't have an order (although they can be compared for equality), I'm forced to scan through each heap.

The algorithm is now: pick the first element (closest sock from where I am), scan through all the elements in that heap (a visual scan over the pile usually does the trick), then remove the matching elements from the heap when found. Like other 'scan unsorted stuff' algorithms, looking for an element that is not there (e.g. because the sock is tangled inside some shirt) is expensive (and annoying).

So remember - if you can't sort, try to group by part of the equality comparison key! And of course, measure, measure, measure.

Finding matches in unsorted collections (or: how to find pairs of socks efficiently)

Additional resources