Deriving from TextReader

TextReader is an abstract base class that represents reading a textual stream. It’s like an enumerator for characters (IEnumerable<char>). Common derived classes in the frameworks include StringReader (which presents a string as a text stream) and StreamReader (which presents a text file as a text stream). 
TextReader exposes several read methods including:

A small digression about why TextReader is so cool…
I must digress and mention that I like the TextReader abstraction quite a bit (though arguably it should have been an interface) because:
1) It’s a very useful abstraction. Lots of different things can be represented as a character stream (such as strings, files, and keyboard input).
2) It’s very easy for derived classes to implement. (contrast to other abstract base classes that have 25+ abstract methods, like XmlReader).
3) It allows for some great class composition:
    Any class that needs a character stream (such as a parser) can just take a TextReader, and now it can easily get input from an endless variety of sources. Another example is that Console.In is also a TextReader, which makes it very easy to override the stdin with a source other than the keyboard. 
    You can also chain TextReaders together. For example, you could build a “EncryptingTextReader” which wraps another TextReader and then does a primitive encoding and forwards the output. Imagine:  EncryptingTextReader.Read() { return Encode(m_innerTextReader.Read()); }

Back on topic: deriving from TextReader:
I’d like to drill into that 2nd point about letting derived classes implement.  In order to derive from an abstract base class, you need to implement all of the abstract methods. Clearly, the more abstract methods, the more difficult it is to derive from.
One way class authors mitigate this is by trying to make methods be virtual (not abstract) and have intelligent default behaviors. For example, there’s a lot of redundancy between the TextReader methods above. In fact, all of the methods can be implemented on top of a single character-based Read().  The TextReader class actually provides default implementations of Read(char[], int, int), ReadLine() and ReadToEnd() that build upon Read(). (Unfortunately, it does not provide a good default Peek() implementation).

1) What about Peek()? The default TextReader.Peek() implementation just returns -1, which means end-of-file (use ildasm to check for yourself!). This is very lame and flat out wrong because:
    1a) If somebody overloads Read() but not Peek(), their derived class is now exposing inconsistent behavior for their TextReader. And that could end up being a very hard bug to track down because it’s going to manifest in the depths of some 3rd-party parser consuming your derived TextReader. At the very least, the default Peek() should throw a NotImplementedException() to avoid such bugs.
    1b) This is doubly silly because you can build Peek() on top of Read(), so requiring derived classes to overload both Peek() and Read()  means requiring them to do extra work.  The base class could have just provided a default implementation of TextReader.Peek() which uses Read(). Then it would actually behave correctly.

2) Read vs. ReadLine()? Another problem is that derived classes need to specifically override “int Read()” (which reads a single character). Conveniently, the base class then provides default implementations of the other Read functions (Read-character-buffer, ReadLine, ReadToEnd) built on top of this Read-single-char. However, sometimes that’s not the best Read() overload for a derived class to implement.
For example, imagine if it’s easier to implement the ReadLine() overload instead of Read().  Perhaps your derived TextReader is wrapping some underlying store which gives you an entire line at a time. In that case, it would be nice to have the derived TextReader just have to say:
    public override string ReadLine() { return m_underlyingStore.GetNextLine(); }
and then get the rest of the TextReader functionality for free built on top of that ReadLine().

Sample code for a TextReader based on ReadLine() instead of Read().
Here’s a helper class that shows how to deal with both of these problems. It derives from TextReader and then serves as base class for readers that want to just implement ReadLine(). It provides implementations of Read() and Peek() based off the derived implementation of ReadLine(). It uses the TextReader base class default implementations for the other Read overloads.

    // TextReader class requires an implementation of both Read() and Peek().
    // This is a helper class that implements both of those based off a derived implementation of ReadLine().
    // This is useful if a derived TextReader can implement ReadLine() more easily than just Read().   
    public abstract class ReadLineTextReader : TextReader
        // The default TextReader.Peek() implementation just returns -1. How lame!
        // We can build a real implementation on top of Read().
        public override int Peek()
            return m_charCache;

        // Reads one character. TextReader() demands this be implemented.
        public override int Read()       
            int ch = m_charCache;
            return ch;

#region Character cache support
        int m_charCache = -2; // -2 means the cache is empty. -1 means eof.
        void ClearCharCache()
            m_charCache = -2;
        void FillCharCache()
            if (m_charCache != -2) return; // cache is already full
            m_charCache = GetNextCharWorker();

#region Worker to get next signle character from a ReadLine()-based source
        // The whole point of this helper class is that the derived class is going to
        // implement ReadLine() instead of Read(). So mark that we don’t want to use TextReader’s
        // default implementation of ReadLine(). Null return means eof.
        public abstract override string ReadLine();

        // Gets the next char and advances the cursor.
        int GetNextCharWorker()
            // Return the current character
            if (m_line == null)
                m_line = ReadLine(); // virtual
                m_idx = 0;
                if (m_line == null)
                    return -1; // eof
                m_line += “\r\n”; // need to readd the newline that ReadLine() stripped
            char c = m_line[m_idx];
            if (m_idx >= m_line.Length)
                m_line = null; // tell us next time around to get a new line.
            return c;

        // Current buffer
        int m_idx = int.MaxValue;
        string m_line;

So you can now derive from ReadLineTextReader, just implement the ReadLine() method and get a 100% fully functioning TextReader.
One downside of this base class is that it does introduce 3 additional fields (int, int, string), so I’m not advocating the BCL replace TextReader with this implementation. However you may find it useful for your own work. (I found it very useful for a pet project; stay tuned for future posts…)

Comments (6)

  1. CN says:

    Regarding your 1b:

    How would you go about building Peek upon Read? Imagine some code like:

    while (reader.Peek() != -1)


    int c = reader.Read();

    // Do something with c


    (it’s not very useful, but that’s not the point)

    Read will advance the current character, but if Peek relies on Read, it will also do so. You have no way to cache the previous result from Read ’til the real call to Read, as Read is implemented by the subclass and not the TextReader default implementation.

    I think that NotSupportedException would make much more sense, as you also mention.

  2. ASP.NET Podcast #7 – Mobile Development [Via: ]


  3. jmstall says:

    CN – "How would you go about building Peek upon Read?"

    Good question – I should have been clearer.

    What I meant to say is that Peek() and Read() can share a common worker function, GetNextCharWorker() in this case.

    You could build Peek() on top of Read() if you converted Read() into a worker function like GetNextCharWorker(), and then provided Peek and Read implementations like the one I show above.

    The design of TextReader could also make this easier.

  4. Here’s how to use C#’s yield keyword to conveniently implement a stream on top of a complex data source

  5. Implementing an XmlReader is very difficult because there are over 25 abstract methods. Here’s a simple way to change the problemspace to implement XmlReader with only 1 real method.

  6. Implementing an XmlReader is very difficult because there are over 25 abstract methods. Here’s a simple way to change the problemspace to implement XmlReader with only 1 real method.