What’s with Encoding.GetMaxByteCount() and Encoding.GetMaxCharCount()? Part 2

A little over a year ago I wrote What’s with Encoding.GetMaxByteCount() and Encoding.GetMaxCharCount()? to address the question “Why does GetMaxCharCount(1) for my favorite Encoding return 2 instead of 1.”  (Short answer is that the Decoder/Encoder could have stored data from a previous call).

To follow up, what about the special case of zero?  It seems that GetMaxByte/CharCount(0) should always be 0.  The answer again is because of the encoder/decoder and the fallback.

Consider that a call to Decoder.GetChars() ends with a lead byte for UTF-8.  The decoder is going to remember that lead byte, expecting the next call to GetChars() to contain the remaining byte(s) necessary to decode a complete UTF-8 sequence.

However if the next call passes in an empty input buffer, yet requests that the buffer get flushed, then the decoder’s going to have to process that lonely lead byte anyway.  This happens for example at the end of a sequence.  In this case, the decoder’s going to call the fallback for the lone lead byte, which by default for UTF-8 will now return a U+FFFD.  So even with an empty input buffer, UTF-8 can return a character.

Similar cases happen with most other encodings, although there are a few cases where encodings don’t have left over bytes when decoding.

Comments (1)

  1. At the office we’re doing a lot of development in Visual

    Studio 2005 which targets version 1.1 of…