IEnumerable<T> and interleaving of I/O with CPU


Not all problems lend themselves to this, but as a general suggestion, if you're handling T[] or {I,}List<T> or ArrayList or the like of items (especially ones coming from I/O sources - files, network, etc.) with a single pass, you should consider (as with all perf, measure, measure, measure!) going the piecemeal route with IEnumerable<T> (or just IEnumerable for non-generic, of course).

While admittedly contrived, let's look at a method that counts the number of regex matches in a file on a line-by-line basis.

static long CountMatches(string filename, Regex regex)

{

    string[] lines = File.ReadAllLines(filename);

    long matchCount = 0;

    foreach (string line in lines)

    {

        matchCount += regex.Matches(line).Count;

    }

    return matchCount;

}

As you can imagine, this is an approach that leads to lots of I/O as we read the entire file into an array in memory (is that your working set or are you just happy to see me?), then lots of CPU as we loop over that array.  That entire time we're fetching the file into memory, our CPU is likely to have plenty of spare cycles that could have been crunching Regex.Matches for us.  Also, we're keeping strong references to all the members of the array (all the contents of the file), even after we're done processing them and could have made them available for GC.

Switching over to a IEnumerable<string> approach (which I like to think of as "stream of strings", but I need a better phrase since Stream already has a BCL meaning, of course), we can help fix a lot of those issues.

static IEnumerable<string> ReadFileLines(string filename)

{

    using (StreamReader streamReader = File.OpenText(filename))

    {

        while (!streamReader.EndOfStream)

        {

            yield return streamReader.ReadLine();

        }

    }

}

 

static long CountMatches(string filename, Regex regex)

{

    IEnumerable<string> lines = ReadFileLines(filename);

    long matchCount = 0;

    foreach (string line in lines)

    {

        matchCount += regex.Matches(line).Count;

    }

    return matchCount;

} 

Now we're better off on multiple fronts:

  1. We can start the Regex'ing as soon as we get the first line - our CPU isn't going to twiddle its thumbs (today's secret word is anthropomorphize) waiting on the entire file to be read.
  2. We're not forcing the entire file to be in memory - we only keep references to lines as long as we need them, and then they're free for the GC to collect as needed.
  3. If the CPU can keep up (very likely true), the uncached-file case has its total time go from around (load time) + (N regex.Matches) to around (load time) + (1 regex.Matches).

Admittedly, to an extent this is somewhat leveraging the knowledge that the various layers (filesystem caches, hard drives, etc.) will pre-fetch/read-ahead as they notice that you've read pages/sectors 1 .. N of a file and are likely to want N+1.. N+x as well, so they go ahead and start fetching those to cut down latency.

It's not a panacea, of course, so it's worth listing the negatives:

  1. Total CPU cycles needed by this process will be higher.  While the IEnumerable introduces a state machine, my gut feeling is our bigger CPU cycle increase will be in potential loss of icache/dcache perf.  We traded nice, tight loops of Matches calls to a loop that includes GetNext, StreamReader, a painful context switch, etc.
  2. If there's an IOException half-way through the file (definitely a corner case), then you may have wasted a bunch of CPU, depending on whether you'd be fine reporting partial results or not.  In the more general sense, you've lost earlier knowledge of whether the entire data structure can be built fine.  You had obviously already lost previous knowledge of things like total count/size.

Obviously, this is contrived, as BufferedStream is also available, among other techniques.

If you're making an API for others to consume, likewise consider taking IEnumerable<T> (or just IEnumerable) if you can operate on your input as such.  This is already in the design guidelines, but it helps free your API's consumers to feed you items piecemeal instead of needing to build a data structure ahead of time.  They can still pass T[] or List<T> if they happen to have that, of course.

And, since it is worth repeating, as with all perf issues, measure, measure, measure!


Comments (3)
  1. Anonymous says:

    >And, since it is worth repeating, as with all

    >perf issues, measure, measure, measure!

    Funny that you didn’t attempt to measure your example 🙂

  2. Eric Newton says:

    This is all well and good, but sometimes regex’s are multi-line spanners… at that point you’re forced to load the entire file into memory

    …at least until Regex gets some kind of streaming helper? Hmmm, that would be an interesting exercise…

  3. @Anonymous:

    I actually produced numbers on my own, but decided against posting them because the last thing I want is people saying "take this approach, it’s X% faster according to this one guy" – that would miss the point. Any alternative benchmarked in a scenario other than yours is *useless* to you, so rather than post numbers that would have people tempted to *not* measure their own scenario, I decided to skip the numbers. Lies, damned lies, and benchmarks!

    @Eric:

    Like I said in the post (where I said "on a line-by-line basis"), it’s contrived. If anyone judges the technique based on a specific example (this is true for any programming construct, of course), they’re doing themselves a disservice. 🙂 I tried to pick something that would be self-documenting, small, and at least conceptually make sense to people.

    Also, a "streaming helper", in the more generic sense (not specific to just Regex), is setting up a producer/consumer with the producer started on a background thread, a shared data structure and lock to protect access, and our main thread consuming.

    That’s certainly another approach, but in many cases, especially where the producer is slow, CPU-light, and I/O-heavy, you’re effectively single-threaded still, and worse you’ve introduced complexity and multi-threading issues for potentially little or no (potentially even negative) gain.

    So, the other good perf rule – write the simplest code that could possibly work, then *if* you’re not meeting perf requirements, profile to find perf problems and consider other techniques. Every time I see code written that assumed the perf of a simpler approach wouldn’t be sufficient without actually measuring it, I die a little inside. 🙂

Comments are closed.

Skip to main content