Restating the problem


A problem statement:

I am trying to loop though a sequence of strings. How can I determine when I am on the last item in a sequence? I don’t see how to do it with “foreach”.

Indeed, “foreach” does not make it easy to know when you are almost done. Now, if I were foolish enough to actually answer the question as stated, I’d probably say something like this:

You can do so by eschewing “foreach” and rolling your own loop code that talks directly to the enumerator:

IEnumerable<string> items = GetStrings();
IEnumerator<string> enumtor = items.GetEnumerator();
if (enumtor.MoveNext())
{
  string current = enumtor.Current;
  while(true)
  {
    if (enumtor.MoveNext())
    {
      DoSomething(current); // current is not the last item in the sequence. 
      current = enumtor.Current;
    }
    else
    {
      DoSomethingElse(current); // current is the last item in the sequence.
      break;
    }
  }
}
else
{
  // handle case where sequence was empty
}

Yuck. This is horrid. A fairly deep nesting level, some duplicated code, the mechanisms of iteration overwhelm any semantic meaning in the code, and it all makes my eyes hurt.

When faced with a question like that, rather than writing this horrid code I’ll usually push back and ask “why do you want to know?”

I am trying to walk though a sequence of strings and build a string like “a,b,c,d”.  After each item I want to place a comma except not after the last item. So I need to know when I’m on the last item.

Well knowing that certainly makes the problem a whole lot easier to solve, doesn’t it? A whole bunch of techniques come to mind when given the real problem to solve:

First technique: find an off-the-shelf part that does what you want.

In this case, call the ToArray extension method on the sequence and then pass the whole thing to String.Join and you’re done.

Second technique: Do more work than you need, and then undo some of it.

Using a string builder, to build up the result, put a comma after every item. Then “back up” when you’re done and remove the last comma (if there were any items in the sequence, of course).

Third technique: re-state the problem and see if that makes it any easier.

Consider this equivalent statement of the problem:

I am trying to walk though a sequence of strings and build a string like “a,b,c,d”. Before each item I want to place a comma except not before the first item. So I need to know when I’m on the first item.

And suddenly the problem is much, much simpler. It’s easy to tell if you’re on the first item in a sequence by just setting a “I’m at the first item” flag outside the foreach loop and clearing it after going through the loop.

When I have a coding problem to solve and it looks like I’m about to write a bunch of really gross code to do so, it’s helpful to take a step back and ask myself “Could I state the problem in an equivalent but different way?” I’ll try to state a problem that I’ve been thinking about iteratively in a recursive form, or vice versa. Or I’ll try to find formalizations that express the problem in mathematical terms. (For example, the method type inference algorithm can be characterized as a graph theory problem where type parameters are nodes and dependency relationships are edges.) Often by doing so I find that the new statement of the problem corresponds more directly to clear code.

Comments (34)

  1. Stu says:

    Don’t forget .Aggregate((x, y) => x + "," + y);

  2. pminaev says:

    Actually, do forget that – it’s not any clearer than String.Join, and using operator+ for strings in a loop is frowned upon for a reason.

  3. Joren says:

    I realise that this is beside the point, Eric, but I think your loop is slightly more horrid than it needs to be. I’d write the body of the outer if-statement in a different way:

    string current = enumtor.Current;

    // ‘current’ may or may not be the last element

    while (enumtor.MoveNext())

    {

    // ‘current’ is not the last element

    DoSomething(current);

    current = enumtor.Current;

    // ‘current’ may or may not be the last element

    }

    // ‘current’ is the last element

    DoSomethingElse(current);

    Still plenty horrible, but at least it’s less nested.

  4. @pminaev: "using operator+ for strings in a loop is frowned upon for a reason."

    It’s because every operator+ as used in the .Aggregate() call will generate a string.  This is (of course) by design.

    Unfortunately, it means that you will produce N-1 strings for each such Aggregate() call, where N is the number of elements in your list, so if you have a list with 5 elements, 4 strings will be produced, 3 of which are garbage and will never be used again.

    In short, you’ll be making lots of work for the GC for minimal gain.

    A StringBuilder version, in contrast, would (hopefully) produce fewer internal temporary objects (due to amortized array reallocations within StringBuilder), to wit:

    list.Skip(1).Aggregate(

     new StringBuilder().Append(list.First()),

     (buf, e) => buf.Append(",").Append(e),

     buf => buf.ToString());

    I did some performance testing awhile back, and by far the fastest version was ToString() + string.Join().  This results in only O(log N) temporaries (due to ammortized array resizing), and the string.Join() call can efficiently pre-allocate the buffer for the resulting string without generating any temporaries.

  5. Ajai Shankar says:

    "For example, the method type inference algorithm can be characterized as a graph theory problem where type parameters are nodes and dependency relationships are edges."

    I’d like to request a full blog post that illustrates the above!

    As a developer I’ve personally struggled to make the leap from some problem that I need to solve and relate it to some algorithm about graph or trees…

  6. chris says:

    In Java, I do this by appending to a StringBuffer, but testing that the buffer is not empty before adding the leading comma.  Granted it doesn’t work right with empty strings, but that’s often not relevant.

  7. Peter Wilson says:

    I like the following which I found on the web recently:

    StringBuilder SB = new StringBuilder();

    String Comma = "";

    foreach item in container {

     SB.add(comma);

     SB.add(Item);

     Comma = ",";

    }

    result = SB.ToString();

  8. Bryan Watts says:

    String.Join, since it performs the requested work, is by far the best choice.

    I prefer an extension method for call-site neatness:

    public static string Join(this IEnumerable<string> source, string separator)

    {

     return String.Join(separator, source.ToArray());

    }

    public static string Join(this IEnumerable<string> source, string separator, int startIndex, int count)

    {

     return String.Join(separator, source.ToArray(), startIndex, count);

    }

  9. MartinHN says:

    I think this is one of the reasons why you often see some rather thoughtless code. Code as if the author went straight to the keyboard after project description etc.

  10. naspinski says:

    I could be way off here, but isn’t this really a trivial problem?  Why are you maing it so difficult?

    string newString = string.Empty;

    for (int i = 0; i < items.Count(); i++)

       newString += items[i] + (i == items.Count() – 1 ? string.Empty : ",");

  11. naspinski says:

    **I ignored performance rules above, just showing a simple way to get the solution

  12. configurator says:

    @naspinski: Since this is an IEnumerable and not an array, getting the Count() is an O(n) operation. which you are doing 2n times, like accessing by index, which is also an O(n) operation. If you are using a ToArray(). you already have a library function to do that anyway.

  13. shiv patil says:

    Reverse your sequence, and pick the first item. You now have the item that was originally the last item. You should separately the handle the case of zero length sequence.

  14. Gabriel Ross says:

    Well, you could just use foreach to add a comma after each item, then take the concatenated string and pop off the last char.

  15. mt says:

    I’ve always liked (pseudo code):

    if (it.hasNext()) {

     sb.append(it.current);

     while (it.hasNext()) {

       sb.append(‘,’);

       sb.append(it.current);

     }

    }

  16. Aaron G says:

    Note to readers: When using the second technique, you’d better make sure that your loop has actually executed at least once before trying to slice off the last character.

    The most reliable approach, and I believe the one that the framework itself uses, is to actually check the output string length and forget about the sequence itself:

    StringBuilder sb = new StringBuilder();

    foreach (B item in items)

    {

     if (sb.Length > 0)

     {

       sb.Append(",");

     }

     sb.Append(item.SomeProperty);

    }

    I use this so often I have a code template for it.  It’s yet another way of restating the same problem: "I want to build a string of tokens with a comma before each token, except at the very beginning of the string."

    In fact, that problem statement actually makes a lot more sense in the case when you want to build the string out of some, but not all elements in the sequence, and you don’t necessarily know ahead of time if the very first item will be included.

  17. Steven says:

    I see no LINQed solution in the posts, so here’s one:

    var collection = …;

    if (collection.Count() > 0)

    {

       foreach (var element in collection.Take(collection.Count() – 1))

       {

           // Process all but the last

       }

       // Process all

       var lastElement = collection.Last();

    }

  18. pminaev says:

    This is a rather inefficient solution, since it calls Count() twice (which is O(n)), and it iterates the whole sequence twice as well (once for Take(), second time for Last()). On the whole, it ends up iterating the entire sequence 4 times: 2 Count() calls, then Take(), then Last(). With return value of Count() memoized, it would be 3, which is still not so good.

  19. Steven says:

    @pmineaev: "premature optimization is the root of all evil." :-). Besides, Count() is only o(n) when the supplied collections does not implement ICollection<T>. When it does, Count() WILL be o(1). Besides, I think my solution is the most readable solution given so far 😉

  20. Joren says:

    Re Steven, pminaev: The Any() extension method is a better way than Count() to determine if a collection is empty.

  21. Dave says:

    Looking in on this discussion, it seems we have found two design flaws in C# (and other similar languages):  (1) hardheaded string semantics where the compiler can’t recognize a loop with a string-append statement and avoid create a mountain of new objects obviously dropped every iteration, and

    This is arguably an implementation flaw, but it is certainly not a design flaw. I am certainly well aware of how to write such an optimization; I’ve done it in a number of languages in the past. JScript .NET has this optimization. We have no semantic or spec restriction preventing us from making this optimization in C#. We have not done so because we have other higher priorities and a limited budget.

    (2) hardheaded collection structures with incompatible interfaces and the inefficient implementation details foisted onto the user (c’mon: List, Array, ICollection, Iterator, Enumerator, foreach, for, while, now Linq).  Am I being naive?  No, since for example the Haskell language and its GHC compiler have efficiently bypassed these problems.  A string is a string and a list is a list.  (Admittedly, every language drifts off with sloppy add-ons like ByteString.)

    This is also not a design flaw, this is a design choice. Some tool makers opt for the strategy of building a specific variation on a tool to solve a specific problem really well. The down side of this approach is you end up with a confusing multiplicity of tools. Some tool makers opt for the strategy of building one really flexible tool and using it for everything, even if it’s not the best tool for a particular task. Both approaches have their benefits and drawbacks. Good design is the art of making reasonable compromises in the face of benefits and drawbacks; if you for philosophical or practical reasons don’t like that we’ve chosen the former approach then you might want to stick with tools made by designers who took the latter approach. — Eric

  22. David Cuccia says:

    @ Aaron G – Yours is my favorite.

    Re: "I use this so often I have a code template for it". If you’re using C#, why not just write an extension method and be done with it:

    string myString = items.Select(B b => b.SomeProperty).ConcatWith(", ");

    public string ConcatWith(this IEnumerable<string> strings, string concatString)

    {

       StringBuilder sb = new StringBuilder();

       foreach (string s in strings)

       {

        if (sb.Length > 0)

        {

          sb.Append(concatString);

        }

        sb.Append(s);

       }

       return sb.ToString();

    }

    @nc Thanks for keeping the discussion honest. 🙂

  23. Jason says:

    (reduce (lambda a,b (+ a (+ ‘,’ b))) list-of-stuff)

  24. fred says:

    // See: naspinski

    string newString = null;

    for (int i = 0, count = items.Count; i < count; i++)
    {
     newString += items[i] + (i == count – 1 ? null as string : “, “);
    }

    FYI, sequences do not necessarily have a “count” property. To get the size of an arbitrary sequence you possibly have to iterate the whole thing. That’s what makes this problem harder. — Eric

  25. mokeefe says:

    I think some missed the point of the article, and even an early post on String.Join. So in reply to the posts more than the article.

    If your source data is large enough for this approach to have a negative impact I would question the model in use and suggest refactoring, but then I don’t know your domain :-).

    Note that this takes a single statement to accomplish.

    IEnumerable<String> strings = new String[] { "One", "Two" };

    String result = String.Join(", ", strings.ToArray());

    //result = "One, Two"

  26. michaelb says:

    Obligatory functional programming (e.g. F#) plug:

    let rec join delim = function

    | [] -> ""

    | x::[] -> x

    | x::rest -> x + delim + (join delim rest)

  27. Jeff Lorenzini says:

    I agree with a couple of other posters, just check to see if your result has something in it, if it does, add the comma. it’s quite simple.

  28. Steve Wagner says:

    And, of course, the silly Stack-derived class is completely unnecessary:

    using System.Collections.Generic;

    using System.Linq;

    using System.Text;

    using NUnit.Framework;

    namespace CommaQuibble

    {

       [TestFixture]

       public class QuibblerFixture

       {

           [Test]

           public void EmptyList()

           {

               IEnumerable<string> list = new[] {""};

               string quibbled = GetQuibbled(list);

               Assert.AreEqual("{}", quibbled);

           }

           private static string GetQuibbled(IEnumerable<string> list)

           {

               Quibbler quibbler = new Quibbler();

               return quibbler.Quibble(list);

           }

           [Test]

           public void OneItemInList()

           {

               IEnumerable<string> list = new[] {"ABC"};

               string quibbled = GetQuibbled(list);

               Assert.AreEqual("{ABC}", quibbled);

           }

           [Test]

           public void TwoItemsInList()

           {

               string[] list = new[] { "ABC", "DE" };

               string quibbled = GetQuibbled(list);

               Assert.AreEqual("{ABC and DE}", quibbled);

           }

           [Test]

           public void ThreeItemsInList()

           {

               string[] list = new[] { "ABC", "DE", "ZYXWV" };

               string quibbled = GetQuibbled(list);

               Assert.AreEqual("{ABC, DE and ZYXWV}", quibbled);

           }

           [Test]

           public void ManyMany()

           {

               string[] list = new[] { "ABC", "DE", "ZYXWV", "FG", "UT", "HI", "SR", "JK", "QP", "LM", "NO" };

               string quibbled = GetQuibbled(list);

               Assert.AreEqual("{ABC, DE, ZYXWV, FG, UT, HI, SR, JK, QP, LM and NO}", quibbled);

           }

           [Test]

           public void ViacheslavIvanov()

           {

               string[] list = new[] { "", ",", "}" };

               string quibbled = GetQuibbled(list);

               Assert.AreEqual("{, , and }}", quibbled);

           }

       }

       public class Quibbler

       {

           public string Quibble(IEnumerable<string> enumerable)

           {

               StringBuilder builder = new StringBuilder("{");

               string last = enumerable.Last();

               string first = enumerable.First();

               if(first == last)

               {

                   builder.Append(first);

               }

               else if(first != last)

               {

                   IEnumerable<string> rest = enumerable.Except(new[] { last });

                   string penultimate = rest.Last();

                   foreach (string item in rest)

                   {

                       builder.Append(item);

                       if (item != penultimate)

                           builder.Append(", ");

                   }

                   builder.AppendFormat(" and {0}", last);

               }

               builder.Append("}");

               return builder.ToString();  

           }

       }

    }

  29. Mo says:

    I think you’re all arguing besides the point. The point of this post – for me – was to THINK about the problem, ie. restating it. That’s a mind technique, not implementation. Everything else is just illustration.

    Thanks for the post. Looking forward to more "restating" examples. Said that, I support Ajai’s request for a full article on graph theory problems. 🙂

  30. Smartass says:

    The proper way to think of lists is not as having a comma after every item, but as having commas BEFORE every item except the first.

    // usage:

    // ListSep comma;

    // for(int i=0; i < N; i++) out<<comma<< X;

    // prints:

    // 1, 2, 3, 4, 5

    struct ListSep {

    int iter;

    const char *sep;

    ListSep(const char *sep_ = ", ") : sep(sep_) {

    iter=0;

    }

    friend cstring &operator <<(cstring &str, ListSep &sep) {

    if (sep.iter) {

    str<<sep.sep;

    }

    sep.iter++;

    return str;

    }

    };

  31. Smartass says:

    I see this point is already made in the post. Didn’t read to the end.

    I note that the denouement is also foreshadowed by the post’s title. Perhaps you did not read that either. — Eric

     

  32. Phil says:

    Got sick of having to repeatedly do this kind of thing myself, so wrote a utility class with set of extension methods for exactly this.  I’ve removed a bunch of overloads of these extension methods but the following includes probably the most used:

       /// <summary>

       /// Utility class for dealing with collection classes

       /// </summary>

       public static class CollectionUtils

       {

           /// <summary>

           /// Delegate used for executing actions using CollectionUtils.ForEach method

           /// </summary>

           public delegate void ForEachAction<T>(T item, bool isFirst, bool isLast);

           /// <summary>

           /// Executes action delegate against a collection, passing boolean

           /// indicator to specify whether the element is the first and/or last

           /// in the collection.

           /// </summary>

           public static void ForEach<T>(this IEnumerable<T> collection,

               ForEachAction<T> action)

           {

               bool isFirstItem = true;

               using (IEnumerator<T> enumerator = collection.GetEnumerator())

               {

                   bool isNotEmpty = enumerator.MoveNext();

                   if (isNotEmpty)

                   {

                       T current = enumerator.Current;

                       while (enumerator.MoveNext())

                       {

                           action(current, isFirstItem, false /* not last */ );

                           if (isFirstItem) isFirstItem = false;   // clear isFirstItem flag if set

                           current = enumerator.Current;

                       }

                       action(current, isFirstItem, true /* is last */);

                   }

               }

           }

           /// <summary>

           /// Returns a formatted string for a collection

           /// </summary>

           /// <remarks>

           /// The string will be formatted so that a space seperates each item in

           /// the collection.  

           /// </remarks>

           public static string FormatCollection<T>(this IEnumerable<T> collection,

               IFormatProvider formatProvider)

           {

               return FormatCollection(collection, "{0} ", "{0}", formatProvider);

           }

           /// <summary>

           /// Returns a formatted string for a collection

           /// </summary>

           public static string FormatCollection<T>(this IEnumerable<T> collection,

               string formatExceptLastItem, string formatLastItem,

               IFormatProvider formatProvider)

           {

               if (collection == null) throw new ArgumentNullException("collection");

               if (formatExceptLastItem == null) throw new ArgumentNullException("formatExceptLastItem");

               if (formatLastItem == null) throw new ArgumentNullException("formatLastItem");

               // allowing formatProvider == null, formatting usually allows a null parameter

               // .NET framework probably assumes CultureInfo.CurrentCulture in this case

               StringBuilder stringBuilder = new StringBuilder();

               collection.ForEach(

                   delegate(T item, bool isFirst, bool isLast)

                   {

                       if (isLast)

                       {

                           stringBuilder.AppendFormat(formatProvider, formatLastItem, new object[] { item });

                       }

                       else

                       {

                           stringBuilder.AppendFormat(formatProvider, formatExceptLastItem, new object[] { item });

                       }

                   });

               // return the formatted string

               return stringBuilder.ToString();

           }

           /// <summary>

           /// Format a collection of bytes (including byte arrays) to a hex

           /// string.  The hex string returned will be in this example format

           /// ‘0FEE9A’.

           /// </summary>

           public static string FormatHexString(this IEnumerable<byte> data,

               IFormatProvider formatProvider)

           {

               return data.FormatCollection("{0:X2}", formatProvider);

           }

           /// <summary>

           /// Returns a formatted string for a collection

           /// </summary>

           public static string FormatCollection<T>(this IEnumerable<T> collection,

               string formatFirstItem, string formatLastItem,

               string formatSingleItem, string formatMiddleItems,

               IFormatProvider formatProvider)

           {

               if (collection == null) throw new ArgumentNullException("collection");

               if (formatFirstItem == null) throw new ArgumentNullException("formatFirstItem");

               if (formatLastItem == null) throw new ArgumentNullException("formatLastItem");

               if (formatSingleItem == null) throw new ArgumentNullException("formatSingleItem");

               if (formatMiddleItems == null) throw new ArgumentNullException("formatMiddleItems");

               // allowing formatProvider == null, formatting usually allows a null parameter

               // .NET framework probably assumes CultureInfo.CurrentCulture in this case

               StringBuilder stringBuilder = new StringBuilder();

               collection.ForEach(

                   delegate(T item, bool isFirst, bool isLast)

                   {

                       if (isFirst)

                       {

                           if (isLast)

                           {

                               // first and last set, so it’s a single item collection

                               stringBuilder.AppendFormat(

                                   formatProvider, formatSingleItem, new object[] { item });

                           }

                           else

                           {

                               // first item is set but last isn’t, so it’s not a single item collection

                               stringBuilder.AppendFormat(

                                   formatProvider, formatFirstItem, new object[] { item });

                           }

                       }

                       else if (isLast)

                       {  

                           // last is set but first isn’t so it’s not a single item

                           stringBuilder.AppendFormat(formatProvider, formatLastItem, new object[] { item });

                       }

                       else

                       {

                           // neither isFirst or isLast are set, so it’s a ‘middle’ item

                           stringBuilder.AppendFormat(formatProvider, formatMiddleItems, new object[] { item });

                       }

                   });

               // return the formatted string

               return stringBuilder.ToString();

           }

           /// <summary>

           /// Returns a formatted string for a collection using the current culture

           /// </summary>

           public static string FormatCollection<T>(this IEnumerable<T> collection,

               string formatFirstItem, string formatLastItem,

               string formatSingleItem, string formatMiddleItems)

           {

               return FormatCollection(collection,

                   formatFirstItem, formatLastItem,

                   formatSingleItem, formatMiddleItems,

                   System.Globalization.CultureInfo.CurrentCulture);

           }

           /// <summary>

           /// Returns a formatted string for a collection

           /// </summary>

           public static string FormatCollection<T>(this IEnumerable<T> collection,

               string formatExceptLastItem, string formatLastItem,

               IFormatProvider formatProvider)

           {

               if (collection == null) throw new ArgumentNullException("collection");

               if (formatExceptLastItem == null) throw new ArgumentNullException("formatExceptLastItem");

               if (formatLastItem == null) throw new ArgumentNullException("formatLastItem");

               // allowing formatProvider == null, formatting usually allows a null parameter

               // .NET framework probably assumes CultureInfo.CurrentCulture in this case

               StringBuilder stringBuilder = new StringBuilder();

               collection.ForEach(

                   delegate(T item, bool isFirst, bool isLast)

                   {

                       if (isLast)

                       {

                           stringBuilder.AppendFormat(formatProvider, formatLastItem, new object[] { item });

                       }

                       else

                       {

                           stringBuilder.AppendFormat(formatProvider, formatExceptLastItem, new object[] { item });

                       }

                   });

               // return the formatted string

               return stringBuilder.ToString();

           }

           /// <summary>

           /// Returns a formatted string for a collection

           /// </summary>

           public static string FormatCollection<T>(this IEnumerable<T> collection,

               string formatExceptLastItem, string formatLastItem)

           {

               return FormatCollection(collection, formatExceptLastItem, formatLastItem,

                   System.Globalization.CultureInfo.CurrentCulture);

           }

       }

  33. Matthew Watson says:

    Jon Skeet blogged about this back in 2007. He exhibited a SmartEnumerable type that wraps an enumerable and provides information such as IsFirst and IsLast:

    http://msmvps.com/blogs/jon_skeet/archive/2007/07/27/smart-enumerations.aspx

  34. pgeerkens@hotmail.com says:

    Reading through your old posts to understand C# better, and I couldn't stop myslef from adding two thoughts to this topic:

    1) A trailing comma is disallowed by the spec, implicitly, but in many cases a leading comma is fine because in (some/many) cases  it will be added anyways; so ask the question.

    2) Several lives back I worked on an Extended Management Information System that auto-generated SQL. A colleague was complaining one day abou the need to constatnly strip trailing/leading "AND"s from his auto-generated WHERE expressions. I suggested simply starting each with "WHERE 1=1" and simply prefix every additional term with an "AND".