String.Split and and some more robust behavior [Kit George]

We get a lot of people pointing out a fundamental design issue with String.Split: when you do your split, it actually splits at every item found, even if the items are contiguous. Now this actually is useful for some scenarios. For example, imagine you have a comma delimitered file with entries like this:

Jones,Bob,2308978,,,47 Baker Street,,Orlando,Florida

It is interesting to preserve the entries between the empty commas, so that if you're presenting this data in something like a datagrid, you always get an array of a fixed length returned when you use String.Split. This way, you know that item 7 in the array returned from String.Split will always be equivalent to Address Line 2 (for example). Note we're assuming the line is always well formed.

So the existing String.Split is interesting for this kind of scenario. The issue people raise is that it is NOT interesting for scenarios where you want to split on boundaries, and multiple copies of the split characters are uninteresting. For example, imagine you're attempting to split on word boundaries something like this:

String s = "Hello and Welcome! This is a sentence, my friends. ";
String[] words = s.Split(new char[] {' ',',','.','!','?'});

Console.WriteLine("Number of words = {0}", words.Length);
for(int i=0;i<words.Length;i++) {
Console.WriteLine("Word {0} = '{1}'", i + 1, words[i]);
}

In this situation, you'll see that Split returns many 'emtpy' entries, since the split occurs at every split character specified, even if the same character occurs twice in a row. In this case, the function mistakenly reports that there are 16 words in the sentence. We could of course remove the empty entries ourselves, but it would be more interesting to be able to have the function do this for us, without having to resort to the more complex RegEx mechanisms.

Well, there's good news: in Whidbey, you have new String.Split overloads which give you what you need to make this happen. String.Split(char[], StringSplitOptions) allows you to specifiy StringSplitOptions.RemoveEmptyEntries can give you the right behavior in the above case. So instead of the above Split line, you use this:

String[] words = s.Split(new char[] {' ',',','.','!','?'}, StringSplitOptions.RemoveEmptyEntries);

This will support the second scenario, and return 9 words, with no empty entries.

For those who are also wondering about splitting on Strings (String.Split previously has supported only the ability to Split on characters), that's also supported now! So you can split on specific strings (example: you want to split a string into sections based on finding the term "Item:"), as opposed to characters or groups of characters:

 public string[] Split( string[] separator, StringSplitOptions options );