Microsoft.Linux.RHEL.5.LogicalDisk.DiskBytesPerSecond Type Mismatch

I have had the following in my notes for a while… and I have not blogged in a while (been too busy) so I decided to blog it today, before the topic gets too old and starts stinking Smile

 

It all started when a customer showed me an Alert he was seeing in his environment from some XPlat workflow. The alert looks like the following:

Generic Performance Mapper Module Failed Execution
Alert Description Source: RLWSCOM02.domain.dom
Module was unable to convert parameter to a double value
Original parameter: '$Data///*[local-name()="BytesPerSecond"]$'
Parameter after $Data replacement: ''
Error: 0x80020005
Details: Type mismatch.
One or more workflows were affected by this.
Workflow name: Microsoft.Linux.RHEL.5.LogicalDisk.DiskBytesPerSecond.Collection
Instance name: /
Instance ID: {4F6FA8F5-C56F-4C9B-ED36-12DAFF4073D1}
Management group: DataCenter
Path: RLWSCOM02.domain.dom\RLWSCOM02.domain.dom Alert Rule: Generic Performance Mapper Module Runtime Failure Created: 6/28/2010 11:30:28 PM

 

First I stumbled into this forum post which mentions he same symptom https://social.technet.microsoft.com/Forums/en-US/crossplatformgeneral/thread/62e0bf3e-be6f-4218-a37b-f1e66f02aa49 - but when looking at the resolution, the locale on the customer machine was good (== set to US settings), so I concluded that it was not the same root cause.

 

Then I looked at what that rule was supposed to do, and queried the same CIM class both remotely thru WS-Man and locally via CIM, and concluded that my issue was that certain values were returning as NULL while we were expecting to see a number on the Management Server – therefore the Type Mismatch!

I have explained previously how to run CIM queries against the XPlat agent; in this case it was the following one:

winrm enumerate https://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX\_FileSystemStatisticalInformation?\_\_cimnamespace=root/scx -username:scomuser -password:password -r:https://rllspago01.domain.dom:1270/wsman -auth:basic –skipCACheck -skipCNCheck

 

SCX_FileSystemStatisticalInformation

AverageDiskQueueLength = null

AverageTransferTime = null

BytesPerSecond = null

Caption = File system information

Description = Performance statistics related to a logical unit of secondary storage

ElementName = null

FreeMegabytes = 4007

IsAggregate = false

IsOnline = true

Name = /

PercentBusyTime = null

PercentFreeSpace = 55

PercentIdleTime = null

PercentUsedSpace = 45

ReadBytesPerSecond = null

ReadsPerSecond = null

TransfersPerSecond = null

UsedMegabytes = 3278

WriteBytesPerSecond = null

WritesPerSecond = null

 

See the NULLs ? Those are our issue.

Now, before you continue reading, I will tell you that I have investigated this also internally, and apparently we have just (in Cumulative Update 3) changed this behaviour in our XPlat modules, so that when NULL is returned, we consider it to be ZERO. Good or bad that is, it will at least take care of the error. But if you don’t get any data from the Unix system… well, you are not getting any data – so that might cause a surprise later on when you go and look at those charts and expect to see your disk “performance counters” but in fact all you have is a bunch of ZERO’s (how very interesting!). So, basically, the fix in CU3 suppresses the symptom, but does not address the cause.

So, let’s see what is actually causing this, as you might well want to get those statistics, or probably you would not be monitoring that server!

I looked at the Cimd.log (set to verbose) only says the following (basically not much: is getting info for 3 partitions… and the provider code is working)

2010-09-01T08:38:32,796Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances()

2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] Object Path = //rllspago01.domain.dom/root/scx:SCX_FileSystemStatisticalInformation

2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() - Calling DoEnumInstances()

2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] DiskProvider DoEnumInstances

2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] DiskProvider GetDiskEnumeration - type 3

2010-09-01T08:38:33,360Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() - DoEnumInstances() returned - 3

2010-09-01T08:38:33,360Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() - Call ReturnDone

2010-09-01T08:38:33,360Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() - return OK

2010-09-01T08:38:33,360Z Trace      [scx.core.provsup.cmpibase.singleprovider.DiskProvider:5964:3086830480] SingleProvider::EnumInstances() - Returning - 0

 

but it still did not give me an idea as to why we would not get data for those “counters”. A this point I stopped using complex troubleshooting techniques and simply turned intuition on, and tried with some help from a search engine: https://www.bing.com/search?q=How+do+I+find+out+Linux+Disk+utilization 

the results I got all mentioned that on Linux you would use the “iostat” command.

So I tried to use and… lol and behold: the iostat commend was NOT INSTALLED on that machine!

Guess what? We installed it (it is included in the “sysstat” package for RedHat linux, so a simple “yum install sysstat” took care of this) and the counters started working!

Hope that is useful to some.