More Notes on LINQ to Text Files

[Blog Map]  This blog is inactive.  New blog: EricWhite.com/blog

Following are a few additional notes regarding the Linq to Text Files example.

Taking Advantage of Multiple CPUs

If you have some type of computing where you need to process large text files, and the processing of the text files is processor intensive, the possibility exists in the future where you could factor your processing into separate LINQ queries and perhaps a future version of LINQ could parcel out each query to a separate CPU. You could take advantage of parallel processors without extensive modifications to your program.

In the distant past, I worked peripherally on an Operations Research project that needed to do extensive computations using an optimization engine. The idea of implementing processor intensive algorythms using LINQ is interesting.

This article on PLinq provides more info.

Memory Usage Profile

One note about using LINQ to text files is that you can use the debugger on the example that I provided, and even though the Visual Studio debugger doesn't work fully with LINQ queries, you can see that the query expressions are executed before a line of text is read from the text file. The main point about the example is that this technique allows you to work with huge files while maintaining a small memory usage profile. Without using the technique, you could have written the same query using File.ReadAllLines, but this populates a string array with the entire text file.

Yield Return

In order to understand how lazy evaluation works, a key piece of technology to understand are iterator blocks and the yield contextual keyword. The C# 2.0 specs are fun to read.

Parallels to text file processing using piping

I was thinking about this the other day, and contemplating how you can do text processing by piping text files from one process to another. When done properly, this technique also can process huge text files with a small memory footprint. The first process reads some lines, processes them, and pipes them on to the next process. Then, the OS does a context switch and the next process takes over, doing its processing. However, this comes with a big overhead - context switches are not cheap. LINQ also can process huge text files with a small footprint, but the underlying mechanics are, of course, vastly different. I'll be following the PLinq with interest.