Performance Monitor Issues

Performance counters are an essential part of any Windows programmer's diet. They have been around forever and the infrastructure is rarely updated (last time I think was in Vista). IT pros couldn't live without them. Performance and stress testing rely on them. They are a very important part of Windows.

But there are problems with performance counters. Problems that should be fixed but seem to receive no attention. Here is my list.

1. Performance Monitor shows different data on different machines

Performance counter data can be recorded to files using logman. The resulting files have a .blg extension. This is great for performance testing since I can keep all the perf counter data in a compact log file for each test. The problem comes when I try to view these blg files. Here is a file that was produced on a Windows Server 2008 R2 VM opened in performance monitor on a different VM with the same OS:

Now here is the same blg file opened from a Windows 8 machine:

This is not consistent either. Most of the time I can open BLG files on my Windows 8 machine without a problem. The trouble is I don't know if it's limited to just Windows 8.

The good thing is that this is just a viewer. The blg file still has the correct data. So how about using a different viewer?

2. There are no libraries to read BLG files

Microsoft has never published the official specifications for the BLG format. There are no libraries available to read the format either. It is possible to reverse-engineer the file format to create a library, but you can never be 100% certain that it works in all situations. Given the inconsistencies in the viewers, I wonder if Microsoft itself has lost the specifications for this file format.

The good thing is that you can convert BLG to CSV format using relog.

3.  The first line of a converted CSV has empty values

When you convert a BLG file to a CSV, the first line below the header contains some empty values. In relog you can specify a -b parameter to begin at a different timestamp. No matter what you set this to, the first line of the CSV has empty values. Here is a look at a converted file:

The first line has empty spaces in it for some of the values but not all. If you move to 5:22:42 (since the -b parameter is only good up to seconds), you'll get this instead:

While this isn't a big deal, it does mean that if you want counters from a specific time period, you have to factor in that this first line tends to be incomplete.

4. Combining several BLGs into one is limited to 32 files and has poor error messages 

Let's say I set up a rolling log in logman to generate a new BLG file every 10 minutes. I want to examine a period of several hours so I take the BLG files generated from that time period. Using relog, I can combine these files into one BLG so that I don't have to open lots of windows and try to understand what happened in what order. In my case, I have 37 BLG files. When I run the command to combine the files, I get a strange error:

The error "The specified log file type has not been installed on this computer" appears to be a catch-all error for relog. One would think that maybe one of the files is corrupt but I can combine into two different groups and combine the results of that without an issue:

 C:\Temp\2013-08-08\origin0>relog perfcounters_08080*.blg -o combined0.blg

Input
----------------
File(s):
     perfcounters_08080451.blg (Binary)
     perfcounters_08080501.blg (Binary)
     perfcounters_08080511.blg (Binary)
     perfcounters_08080521.blg (Binary)
     perfcounters_08080531.blg (Binary)
     perfcounters_08080541.blg (Binary)
     perfcounters_08080551.blg (Binary)
     perfcounters_08080601.blg (Binary)
     perfcounters_08080611.blg (Binary)
     perfcounters_08080621.blg (Binary)
     perfcounters_08080631.blg (Binary)
     perfcounters_08080641.blg (Binary)
     perfcounters_08080651.blg (Binary)
     perfcounters_08080701.blg (Binary)
     perfcounters_08080711.blg (Binary)
     perfcounters_08080721.blg (Binary)
     perfcounters_08080731.blg (Binary)
     perfcounters_08080741.blg (Binary)
     perfcounters_08080751.blg (Binary)
     perfcounters_08080801.blg (Binary)
     perfcounters_08080811.blg (Binary)
     perfcounters_08080821.blg (Binary)
     perfcounters_08080831.blg (Binary)
     perfcounters_08080841.blg (Binary)
     perfcounters_08080851.blg (Binary)
     perfcounters_08080901.blg (Binary)
     perfcounters_08080911.blg (Binary)
     perfcounters_08080921.blg (Binary)
     perfcounters_08080931.blg (Binary)
     perfcounters_08080941.blg (Binary)
     perfcounters_08080951.blg (Binary)

Begin:    8/8/2013 4:51:02
End:      8/8/2013 10:01:21
Samples:  18623

100.00%

Output
----------------
File:     combined0.blg

Begin:    8/8/2013 4:51:02
End:      8/8/2013 10:01:21
Samples:  18623

The command completed successfully.

C:\Temp\2013-08-08\origin0>relog perfcounters_08081*.blg -o combined1.blg

Input
----------------
File(s):
     perfcounters_08081001.blg (Binary)
     perfcounters_08081011.blg (Binary)
     perfcounters_08081021.blg (Binary)
     perfcounters_08081031.blg (Binary)
     perfcounters_08081041.blg (Binary)
     perfcounters_08081051.blg (Binary)

Begin:    8/8/2013 10:01:22
End:      8/8/2013 11:01:24
Samples:  3605

100.00%

Output
----------------
File:     combined1.blg

Begin:    8/8/2013 10:01:22
End:      8/8/2013 11:01:24
Samples:  3605

The command completed successfully.

C:\Temp\2013-08-08\origin0>relog combined*.blg -o combined.blg

Input
----------------
File(s):
     combined0.blg (Binary)
     combined1.blg (Binary)

Begin:    8/8/2013 4:51:02
End:      8/8/2013 11:01:24
Samples:  22228

100.00%

Output
----------------
File:     combined.blg

Begin:    8/8/2013 4:51:02
End:      8/8/2013 11:01:24
Samples:  22228

The command completed successfully.

So perhaps the issue is the number of files. By removing files one at a time, I found that relog will handle 32 files maximum. This is not documented nor is the error message able to communicate this clearly.

5. Combining BLG files from multiple machines does not work well

While this may be stretching what relog was intended to do, I think there is a perfectly reasonable case for wanting to do this. If you have a cluster of machines performing the same function, you may want to get an average CPU or a total throughput for the whole cluster. I do this all the time in Azure. If you collect the performance counters from each machine, then it should be straightforward to combine these data. Unfortunately it's not. The counters in the BLG have the machine name included. So when you combine the counters don't mix. You have to do this by hand:

  1. Convert each BLG file to a CSV
  2. In each CSV replace the machine name in the perf counter with a generic name like "server"
  3. Read the data from each CSV simultaneously using a rolling window and combine the perf counters in the time window

In other words, forget about it and pay for a tool that will do it for you. 

6. Relog has trouble reading dates from the command line

Look at the following output:

 c:\test>relog perf1.blg -f csv -o perf1.csv -b "8/31/2013 0:52:14" -e "8/31/2013 1:02:41"

Input
----------------
File(s):
     perf1.blg (Binary)

Begin:    8/31/2013 0:52:16
End:      8/31/2013 1:02:41
Samples:  626

100.00%

Output
----------------
File:     blah.csv

Begin:    8/31/2013 0:52:16
End:      8/31/2013 1:02:41
Samples:  625

The command completed successfully.

c:\test>relog perf1.blg -f csv -o perf1.csv -b "8/31/2013 12:52:14AM" -e "8/31/2013 1:02:41AM"

Input
----------------
File(s):
     perf1.blg (Binary)

Begin:    8/31/2013 0:52:16
End:      8/31/2013 1:02:41
Samples:  626

Error: The time range specified has no overlap with the input logs.

The relog documentation says you can use timestamps with AM/PM but does not explicitly say you can use 24 hour time. Also, when it shows begin and end, those appear to be the same in both commands. But one works and one does not.