Performance Quiz #3


The funny thing about managed code is that sometimes people forget that just because we offer an easy/convenient way to do things if you’re doing a quick RAD application doesn’t mean that the RAD way is in fact the right way.  In fact, the good old fashioned way is very likely still available and may be an excellent choice.   So that brings us to Performance Quiz #3.


The question:  Characterize the difference in space and speed between RadParser and ClassicParser, consider the given test input to be typical.


I’ll post an analysis in a few days.  Don’t peek at the feedback until you’re done with your own work 🙂


——— radparser.cs —————


using System;
using System.IO;


class RadParser
{
    static void Main(String[] args)
    {
        StreamReader stream = new StreamReader(args[0]);


        String line;
        Char[] delims = { ‘ ‘, ‘\t’ };
        int sum = 0;


        while ((line = stream.ReadLine()) != null)
        {
            String[] fields = line.Split(delims);


            foreach (String field in fields)
            {
                sum += Int32.Parse(field);
            }
        }


        Console.WriteLine(“The total of the ints is: {0,8:n0}”, sum);
    }
}


—- testfile.txt (2500 lines of this)
234 2348 -7295 27 52932 -2352 2352 252532 1 34 452 52 345 2 245 4525



——— classicparser.cs —————


using System;
using System.IO;


class ClassicParser
{
    static void Main(String[] args)
    {
        FileStream fs = new FileStream(args[0], FileMode.Open, FileAccess.Read);
        Byte[] bytes = new Byte[4096];


        int sum = 0;
        int tmp = 0;
        bool negative = false;
        bool digits = false;


        for (;;)
        {
            int count = fs.Read(bytes, 0, bytes.Length);


            if (count <= 0)
            {
                if (negative)
                    sum -= tmp;
                else
                    sum += tmp;
                break;
            }


            for (int i = 0; i < count; i++)
            {
                switch ((char)bytes[i])
                {


                case ‘ ‘: case ‘\t’: case ‘\r’: case ‘\n’:
                    if (negative)
                        sum -= tmp;
                    else
                        sum += tmp;


                    negative = false;
                    digits = false;
                    tmp = 0;
                    break;


                case ‘-‘:
                    if (negative)
                        throw new FormatException(“two negatives”);


                    if (digits)
                        throw new FormatException(“negative after digits”);


                    negative = true;
                    break;


                case ‘0’: case ‘1’: case ‘2’: case ‘3’: case ‘4’:
                case ‘5’: case ‘6’: case ‘7’: case ‘8’: case ‘9’:
                    digits = true;
                    tmp = tmp*10 + bytes[i] – (Byte)’0′;
                    break;


                default:
                    throw new FormatException(“invalid character”);
                }
            }
        }


        Console.WriteLine(“The total of the ints is: {0,8:n0}”, sum);
    }
}

Comments (4)

  1. Marco Russo says:

    The RAD parser creates 16 new string for each line. If you have 2,500 lines, there are 40,000 new strings created that has to be collected by GC. Not to count the 2,5000 arrays you creates just to feed the foreach loop.

    In terms of speed, there are possibly two bottlenecks: the String.Split method and the Int32.Parse method. I hadn’t checked with a profiler how much it is. GC isn’t a heavy load on my tests.

    Anyway, I prefer to use the RAD source code unless the performance impact of the parser become a real bottleneck in a real application…. 🙂

  2. Rico Mariani says:

    Some people have pointed out that the two parsers don’t parse exactly the same thing. This is of course correct — the 2nd parser is somewhat simplified compared to the first. I did this largely in the interest of keeping the source code small and somewhat understandable.

    One of the things I plan to discuss in the analysis is the complexity/correctness part of the trade-off.

    However it is worth noting that tightening or widening the accepted strings of the 2nd parser would not materially affect its performance characteristics so for purposes of this analysis what is there is fairly representantive.

  3. Ryan Lamansky (Kardax) says:

    It’s also worth mentioning that the second approach is almost free of function calls.

    One of the important performance risks of calling into an API (particularly a closed-source one) is that you have no idea what it does to get your results, but you can have some assurance that it’s not specially optimized for what you’re doing.

    Since the whole "problem" is closed, the JIT could (theoretically) optimize all the extra stuff away, but it’ll be a long time before we see a JIT that powerful 🙂