TextWriter.WriteLine(string, object[]) in less memory, less time

I found myself increasingly generating reports in .csv format, and then use Excel to analyze the reports. With large amount of data written to the .csv file, file generation time becomes more and more noticable. So I took some time to check what is taking so much time and how can it be reduced.

For one .csv file I'm generating, the report generation is essentially a big loop with tons of TextWriter.WriteLine calls:

 static public void Version0()
{
    for (int i = 0; i < count; i++)
    {
        writer.WriteLine(
            "\"{0:x8}\",{1},{2},\"{3}\",\"{4}\",\"{5}\"",
            Data.Column1(i), Data.Column2(i), Data.Column3(i),
            Data.Column4(i), Data.Column5(i), Data.Column6(i));
    }
}

Profiling shows that TextWriter.WriteLine(string, object[]) forms a string first, and then calls WriteLine(string). But the latter allocates a char array for the string plus "\r\n", sending combined data to the basestream. It's nice that the lines would now be completed in a single call, but there is extra memory and CPU cost. Given that we're not writing to a console, and there is only a single writer, here is the next version:

 static public void Version1()
{
    for (int i = 0; i < count; i++)
    {
        writer.Write(
            "\"{0:x8}\",{1},{2},\"{3}\",\"{4}\",\"{5}\"",
            Data.Column1(i), Data.Column2(i), Data.Column3(i),
            Data.Column4(i), Data.Column5(i), Data.Column6(i));

        writer.WriteLine();
    }
}

The change here is quite simple: splitting a single WriteLine call into two calls, char[] allocation is gone. Now the program is that TextWriter is creating a StringBuilder to form the output string, and string is converted to char[] before writing to the base stream. Given that it's important to our scenario, we can remove those extra cost too:

 static public void Version2()
{
    StringBuilder sb = new StringBuilder(512);
    char[] buffer = new char[512];

    for (int i = 0; i < count; i++)
    {
        sb.Clear();
        sb.AppendFormat(
            "\"{0:x8}\",{1},{2},\"{3}\",\"{4}\",\"{5}\"",
            Data.Column1(i), Data.Column2(i), Data.Column3(i),
            Data.Column4(i), Data.Column5(i), Data.Column6(i));

        sb.CopyTo(0, buffer, 0, sb.Length);
        writer.Write(buffer, 0, sb.Length);
        writer.WriteLine();
    }
}

In this version, a StringBuilder and char[] array is created outside of the loop, and reused for each line in the .csv file. After a line is formed in the StringBuilder, we copy the result to the char[] array instead of converting it to string first.

At this stage, the biggest spender is calling AppendFormat. To pass data for the 6 columns to AppendFormat, C# generates code to allocate an Object[] array, and then boxes up three integers. Within AppendFormat implementation, Int32.ToString method is called to convert integers to string with invariant culture info. It will cost less memory and CPU to call ToString outselves:

 static public void Version3()
{
    StringBuilder sb = new StringBuilder(512);
    char[] buffer = new char[512];

    CultureInfo info = CultureInfo.InvariantCulture;
                        
    for (int i = 0; i < count; i++)
    {
        sb.Clear();
        sb.AppendFormat(
            "\"{0}\",{1},{2},\"{3}\",\"{4}\",\"{5}\"",
            Data.Column1(i).ToString("x8", info),
            Data.Column2(i).ToString(info),
            Data.Column3(i).ToString(info),
            Data.Column4(i), Data.Column5(i), Data.Column6(i));

        sb.CopyTo(0, buffer, 0, sb.Length);
        writer.Write(buffer, 0, sb.Length);
        writer.WriteLine();
    }
}

If you profile the latest code again, you will find the three ToString calls consumes the most memory and CPU, it's time to replace them:

 static public void Version4()
{
    StringBuilder sb = new StringBuilder(512);
    char[] buffer = new char[512];

    CultureInfo info = CultureInfo.InvariantCulture;

    for (int i = 0; i < count; i++)
    {
        sb.Clear();

        sb.Append('"');
        sb.AppendInt32(Data.Column1(i), 16, 8);
        sb.Append("\",");

        sb.AppendInt32(Data.Column2(i));
        sb.Append(',');

        sb.AppendInt32(Data.Column3(i));
        sb.Append(",\"");

        sb.Append(Data.Column4(i));
        sb.Append("\",\"");

        sb.Append(Data.Column5(i));
        sb.Append("\",\"");

        sb.Append(Data.Column6(i));
        sb.Append('"');

        sb.CopyTo(0, buffer, 0, sb.Length);
        writer.Write(buffer, 0, sb.Length);
        writer.WriteLine();
    }
}

If you wonder how much memory and CPU can these changes save us, here is the performance test result:

 WriteLine.exe

       WriteLine x  1,000,000   1292.942 ms 161 gc,  1,829,516 bytes
           Write x  1,000,000    989.864 ms 122 gc,  2,527,544 bytes
   StringBuilder x  1,000,000    925.884 ms  84 gc,    985,392 bytes
        ToString x  1,000,000    778.161 ms  43 gc,    415,552 bytes
       AppendInt x  1,000,000    638.102 ms   0 gc,    423,744 bytes

When generating a .csv file of 1 million rows, the original version takes 1.292 seconds, causing 161 garbage collections, and the final heap has 1.829 mb allocations. Subsequent versions reduce memory allocations and CPU time at the same time. The final version finishes in 0.638 seconds with not a single garbage collection. This indicates that the final version does not allocate any managed memory in the big loop.

HereĀ  are CPU sample events displayed in PerfView:

Total program run takes 5.218 seconds, the original version takes 1.36 seconds (25.2% of total time), the final version takes 0.673 seconds.

Here are the CLR memory allocation tick events displayed in PerfView:

Total program run allocates 1.296 Gb of memory. The original version allocates 511 mb memory, StringBuilder.ToString allocates 130 mb, TextWrite.WriteLine allocates 117 mb. The final version (Version4) is not even shown here. CLR memory allocation tick event is fired once for every 100kb memory allocation or when big objects are allocated. Version4 only allocates a few small objects before entering the big loop, not enough to fire an event.