Zip Me Up

Suppose you’ve got a sequence of Foos and you want to project from that a sequences of Bars. That’s straightforward using LINQ:

IEnumerable<Bars> bars = from foo in foos select MakeBar(foo);

or, without the query sugar:

IEnumerable<Bars> bars = foos.Select(foo=>MakeBar(foo));

But what if you have two sequences that you want to project from? Say you’ve got two sequences of doubles that are the same length, and you want to project a sequences of points.

The operation of “project from two sequences of the same length” is called a “zip join”, because its like doing up a zipper – you need each member of the left sequence to correspond to a member of the right sequence. In the version of the base class library that will ship with C# 4.0, we’ll be adding a Zip sequence operator to the standard extension methods.

The code to do so is pretty trivial; if you happen to need this in C# 3.0, I’ve put the source code below.

We considered adding a query syntax for zip joins. Something like

from foo in foos, bar in bars select MakeBlah(foo, bar)


from (foo, bar) in (foos, bars) select MakeBlah(foo, bar)

or some such thing, which would be syntactically transformed into

foos.Zip(bars, (foo, bar)=>MakeBlah(foo, bar))

However, we decided that adding a query syntax for a relatively obscure operation that is not actually supported by most relational databases was not worth the cost of complicating the grammar and the syntactic translation rules.

Here’s the source code:

public static IEnumerable<TResult> Zip<TFirst, TSecond, TResult>
    (this IEnumerable<TFirst> first,
    IEnumerable<TSecond> second,
    Func<TFirst, TSecond, TResult> resultSelector)
    if (first == null) throw new ArgumentNullException(“first”);
    if (second == null) throw new ArgumentNullException(“second”);
    if (resultSelector == null) throw new ArgumentNullException(“resultSelector”);
    return ZipIterator(first, second, resultSelector);

private static IEnumerable<TResult> ZipIterator<TFirst, TSecond, TResult>
    (IEnumerable<TFirst> first,
    IEnumerable<TSecond> second,
    Func<TFirst, TSecond, TResult> resultSelector)
    using (IEnumerator<TFirst> e1 = first.GetEnumerator())
        using (IEnumerator<TSecond> e2 = second.GetEnumerator())
            while (e1.MoveNext() && e2.MoveNext())
                yield return resultSelector(e1.Current, e2.Current);

Comments (31)

  1. Sounds similar to the and List.unzip functions in F#.

    Interesting why F# decided to have this on List instead of any sequence i.e IEnumerable<T>

  2. DRBlaise says:

    Just a small correction: I believe bar should be bars in

    foos.Zip(bars, (foo, bar)=>MakeBlah(foo, bar))

  3. Dan says:

    Although I generally like the syntax of extension methods, it seems like an odd fit for the zip method.  To me, it seems like it would be more readable if there were either static method imports (similar to java) or to just put the method on a class and not make it an extension. For example:

    instead of

    foos.Zip(bars, (foo, bar)=>MakeBlah(foo, bar))

    you would have

    Lists.Zip(foos,bars, (foo, bar)=>MakeBlah(foo, bar))

    or just

    Zip(foos,bars, (foo, bar)=>MakeBlah(foo, bar))

    or without the last lambda, and some sort of tuple support


    which could be used like this (also assuming good tuple support):

    foreach( (foo,bar) in Zip(foos,bars){

    //do something


    I like the concept, but the way it sounds when I read it doesn’t seem to convey the meaning very well (at least not for how my brain works).

    Either way its still a great addition 😀

  4. Well, nothing stops you from calling it explicitly as Enumerable.Zip(), right? I often do that with Concat() for precisely the same reasons.

    (by the way, I wish Enumerable.Concat() was made vararg…)

  5. Olivier Leclant says:

    Nice… A good part of C#3 IEnumerable<> extensions resemble what exists in Haskell, so I always felt like Zip() was missing. But Unzip() can probably not be added, unless some standard tuples are added too.

    I feel like I could have used the general syntax for zip joins to replace the majority of my ‘for’ loops (as opposed to ‘foreach’ loops). The main use of indices is for looking up in other collections in a synchronized way.

    Sure, the database querying analogy doesn’t extend to zip joins, but I mostly use linq for functional iterations anyway. 🙂

  6. Bubba blub blop says:

    I like how zip join is defined in Haskell:

    zipWith          :: (a->b->c) -> [a]->[b]->[c]

    zipWith z (a:as) (b:bs)            =  z a b : zipWith z as bs

    zipWith _ _ _                          =  []

    20 lines of code converted into 3! Granted, the Haskell version doesn’t check for nulls, but that’s because it doesn’t have null =).

  7. kosak says:

    Nice…  Though in my opinion a drawback of the given implementation is that if the enumerables are of different lengths, it silently drops the extra elements.  Would probably be better to have a test that if either enumerable hits its end, the other one hit its end at the same time.  (and of course you want to behave properly if the inputs have infinite length)

    Doing so is problematic for common cases. For example, suppose you want to zip a finite sequence against an infinite sequence of integers to form a finite indexed sequence. With your proposal you’d have to iterate the finite sequence twice, once to count it, and once to zip it with a finite index sequence. By allowing slop on either side we allow one of the sequences to be infinite. — Eric 


    Something like

         using(var e1 = first.GetEnumerator()) {
           using(var e2=second.GetEnumerator()) {
             while(true) {
               var hasMore1=e1.MoveNext();
               var hasMore2=e2.MoveNext();
               if(!hasMore1 || !hasMore2) { //one of them has ended
                 if(hasMore1 || hasMore2) { //one of them has not ended
                   throw new Exception(“Enumerables have different lengths?”);
               yield return resultSelector(e1.Current, e2.Current);

  8. Opfer says:

    Funny, I actually had implemented something like this already, although I did it a little differently.

    public static IEnumerable<IMulti<T1, T2>> Multi<T1, T2>(IEnumerable<T1> source1, IEnumerable<T2> source2)


    IEnumerator<T1> enumerator1 = source1.GetEnumerator();

    IEnumerator<T2> enumerator2 = source2.GetEnumerator();

    while (enumerator1.MoveNext() && enumerator2.MoveNext())

    yield return new Multi<T1, T2>(enumerator1.Current, enumerator2.Current);



    public interface IMulti<T1, T2>


    T1 Data1 { get; }

    T2 Data2 { get; }


    class Multi<T1, T2> : IMulti<T1, T2>


    readonly T1 data1;

    readonly T2 data2;

    public T1 Data1 { get { return data1; } }

    public T2 Data2 { get { return data2; } }

    public Multi(T1 data1, T2 data2)


    this.data1 = data1;

    this.data2 = data2;



    That way, it can also be used with a foreach loop to iterate through sequences of the same length. The only drawback is that you get ugly identifiers like Data1 an Data2. I wish there was something like:

    foreach (Foo foo, Bar bar in foos, bars) ;

  9. RichB says:

    I understand why you’ve decided against the query comprehension syntax. But it would be great if you could add it to System.Linq.Enumerable so we still have parity with Python.

  10. Ben says:

    I would also have checked lengths first.

    If you check lengths first, you have to iterate both sequences twice. That potentially doubles the cost to every call. — Eric

    And would prefer static syntax, but that may just be the influence of F#

    Love you blog, though I only understand 70% of it (but that’s better than a few years ago – so either you’re dumbing down, or I’m smarting up)

  11. This is a good thing, but it seems to be somewhat limited. I would prefer to have a index(foo) pseudo-function in LINQ syntax to achieve maximum flexibility when joining by index.

    Also, the name is confusing — Zip is a very well known archiving method, and the functional meaning will be unfamiliar to many developers. I would name it something like Merge, which defines the semantics.

  12. Anthony Tarlano says:

    @Kosak I see IEnumerable<T>’s of different lengths as a feature not a bug..

    The fact that Eric didn’t require the the zip to work on bounded sequences, where the upper bound (Length) is known aprori, is a feature not a bug. It will allow the methods use with unbounded generator based streams.

    I for one ask, please don’t add any checking to make sure the two IEnumerable<T>’s are the same cardinality.

  13. Ken Hirsch says:

    In a slightly more abstract view, Zip and Unzip are really the same function, and that function is Transpose. Zip turns a pair of lists (2xN) into a list of pairs (Nx2). Unzip just transposes the dimensions again.  In APL and derivatives (J, K, etc.) it actually works that way because lists, tuples, and function parameters are either the same thing or interchangeable.

  14. Weeble says:

    I use two variations of a method like this. Zip is as above – it stops once either enumerable expires. ZipEqual throws an exception if one enumerable expires before the other. It doesn’t do any sort of checking in advance, it just throws the exception during iteration when one expires and the other doesn’t. I tend to use ZipEqual whenever possible because it makes it clear that the sequences are expected to be the same length. I think I got the name and the behaviour from ML. Still, if you’re only going to provide one in the standard library I think it should be the more flexible Zip.

  15. Thank you for submitting this cool story – Trackback from DotNetShoutout

  16. Joe says:

    Clearly the query syntax is pared down to some notion of what is most commonly used, but I’m not sure whether or not an operator is supported by the majority of relational databases is a good criteria to base that decision on.  The query syntax is part of Linq, not part of a specific ORM like LinqToSql or the EF.

  17. Jason Haley says:

    Interesting Finds: May 8, 2009

  18. Sam Saffron says:

    I would go with an implementation that matches Ruby for objects:

    irb> [1,2,3].zip([4,5])

    => [[1, 4], [2, 5], [3, nil]]

    irb> [1,2,3].zip([4,5,6,7])

    => [[1, 4], [2, 5], [3, 6]]

    Either exception out or have nil as the second member of the tuple when the zipee is shorter.

    Keep the existing behavior for zippor is shorter than the zipee

    Also this seems a bit more natural and as @OlivierLeclant said in an earlier comment, it is unzippable,  

    public static IEnumerable<Tuple<TFirst, TSecond>> Zip<TFirst, TSecond> (this TFirst first, TSecond second);

  19. kosak says:

    Anthony, I agree, and the point is that the code I posted works perfectly well on unbounded streams, as should have been obvious from both the code and the parenthetical comment in my explanation.

    Nothing in my code requires that the enumerables are bounded.  The only additional requirement that my code imposes is that IN THE CASE THAT the code encounters the end of one stream, it will check that the other stream terrminates there also.

  20. Anthony Tarlano says:

    @Kosak  : I just don’t think it is an Exception to have different lengths. Are you telling me you code doesn’t throw on different lengths?

    """ throw new Exception("Enumerables have different lengths?"); """

  21. Greg says:

    Can you include how that would be done without using LINQ and possibly a real world scenario where this could be used?

  22. Ed says:

    Reviewing what I learned from your excellent post on naming things, I have to admit that Andrey Shchekin’s comment is spot on.

  23. Lucas says:

    Why did you choose a 2-method implementation? What is the need for the private ZipIterator()? Can’t you yield return from Zip()?

    Good question. I covered that already a while back:

    — Eric


  24. Lucas says:

    Thanks Eric!! I use iterators all the time and was not aware that my null checks (just like in your psychic debugging example) where being deferred… doh!

  25. "However, we decided that adding a query syntax for a relatively obscure operation that is not actually supported by most relational databases was not worth the cost of complicating the grammar and the syntactic translation rules."

    That’s exactly the reason why I think Scala made the right decision to put Linq-similar things in libraries and instead create an expandable language. While Linq surely feels nice it can’t be extended by anyone except the C# developers…

  26. Scala still has syntactic sugar for query comprehensions, no? So it exposes "filter" and "map" already, and in a non-extensible way, too.

    Sure, C# goes further than that, with joins and ordering, but fundamentally there’s no difference. You still want the common operations to be sugared, the only question is where to draw the dividing line.

  27. Eric, will there be an Unfold in version 4?

    Not to my knowledge. But it would be trivial to write such a beast yourself. How about this?

    static IE<T> Unfold<S, T>(this S state, Func<S, bool> done, Func<S, S> next, Func<S, T> transform)
      while (true) 
        yield return transform(state);
        if (done(state)) yield break;
        state = next(state);

    — Eric

  28. Pop Catalin says:

    Daniel Lehmann wrote:

    > That’s exactly the reason why I think Scala made the right decision to put Linq-similar things in libraries and instead create an expandable language. While Linq surely feels nice it can’t be extended by anyone except the C# developers…

    Linq is a pattern, anyone can adhere to the Linq pattern, which means that the implementation part of Linq is extensible at the core, the only part of Linq that is not extensible is the comprehension syntax (you can’t add new comprehension keywords, but you can make the comprehension syntax translate to entirely different "member" calls, usually methods). Other than adding new language keywords, you can extend Linq in any possible way imaginable (you can completely replace Linq to SQL, or Linq to XML, or Linq to Objects with your implementation, or simply mix other implementations with your own selectively)…

    The comprehension syntax in linq is simply a loose pattern matching (that works by translating to appropriate "member" calls (not just methods), using regular member resolution rules), the rest of Linq is straight forward libraries with standard CLR types that adhere to the linq pattern, and which can be implemented by anyone.

  29. Re: Unfold, thanks! I already wrote my own in a simpler form where the transform part is left out (just stick a Select after it if you need that). I guess everyone using Linq is writing their own slightly different version of it.

  30. Greg says:

    The funny thing is that the code to use the zip join is overly expensive to maintain and develop for in production quality commercial software.  The cost is not really in lines of code but in the mental effort needed to understand and verify what is actually happening (easy to verify in a single 10 line chunk but not easy to do when you have 500+ methods using similar coding constructs).

    It fails the Monday morning, did not sleep the night before, code takes too long to understand test (even for code you wrote a week ago).

    list ret = new list()

    int x = max( list1.length, list2.length )

    for (int ctr = 0; ctr < x; ctr++)


     element o1 = o2 = null

     if (x < list1.length)     o1 = list1[x];

     if (x < list2.length)     o2 = list2[x];

     e = ret.addnew(o1, o2)


    Concentrating effort on better high level algorithms, data base organization and project organization will yield many times more productivity.

    The straight line sequential code is easier to take over and maintain by a developer unfamiliar with the rest of the application’s code.

  31. Neil says:

    I don’t claim to know how iterators work in C#, but the way another language implementation throws an iteration exception when you run off the end of the list allows you to simplify the zip operation something like this:

    function zip(left, right) {

     for (;;)

       yield [,];