I am testing the beta bits of the cross-platform extensions that were released on connect.microsoft.com
This is my limited testing so far (I hope this can benefit everyone for everything that might currently be yet “undocumented”):
After installing the agent the way the documentation says (it’s just a RPM package, not much that can go wrong with that), I tried to figure out how things were layed out on the linux agent. It is all pretty understandable, after all, if you look around on the machine (documented or not, linux and open source stuff is easy to figure out):
Basically the “agent” is not properly an “agent” the way the windows agent is, since it does not really “sends” stuff to the Management Server on its own: It consists of a couple of services/daemons, based on existing opensource projects, but configured in their own folder, with their own name, and using different ports than a standard install of those, not to conflict with possible existing ones.
The Management Service uses these services remotely (similar to doing agentless monitoring towards a windows box) using these services. The two services are:
- scx-cimd which implements the CIM daemon (openpegasus.org)
- scx-wsmand which implements Ws-Man daemon (openwsman.org)
It is easy to figure out how they are layed out. Even if undocumented, you look at the processes
and you can figure out WHERE they live (/opt/microsoft/scx/bin/….) and where their configuration files are located (/etc/opt/microsoft/scx/conf …).
The files are self explanatory, and documentation is
- for wsmand
- at openwsman.org (for wsmand)
- for cimd
- at openpegasus site (http://www.openpegasus.org/documents.tpl?CALLER=doc.tpl&dcat= )
- on the openpegasus wiki (http://wiki.opengroup.org/pegasus-wiki/doku.php?id=start )
- at the linux management IBM page http://www.ibm.com/developerworks/linux/library/os-ltc-systemsmanagement/
I still have to delve into them properly as I would like to, but I already figured out a bunch of interesting things by quickly looking at them.
Agent Communication someone must have decided to “recycle” the 1270 port number that was used in MOM2005 🙂 Basically openwsman listens as a SSL listener (with basic auth – connected via PAM module with the “regular” unix /etc/passwd users). So all that happens is that the Management Server asks things/executes WS-Man queries and commands on this channel. The Management Server connects every time to the agent on port 1270 using SSL, authenticates as “root” (or as the specified “Action Account”) and does its stuff, or asks the agent to do it. So the communication is happening from the Management Server to the agent… not the other way around like it happens with Windows “agents”. That’s why it feels to me more like an “agentless” thing, at least for what concerns the “direction” of traffic and who does the actual querying.
For the rest, the provided Management Packs have “normal” discoveries and “normal” monitors. Pretty much like the Windows Management Packs often discover thing by querying WMI, here they use WS-Man to run CIM queries against the Unix boxes.
The Service Model is totally cool to actually *SEE* in action, don’t you think so ?
A few debugging/troubleshooting information:
I searched a bit and found the openwsman.org documentation and forum to be useful to figure some things out. For example I banged my head a few times before managing to actually TEST a query from windows to linux using WINRM. THIS document http://openwsman.org/openwsman-users-guide/vista-winrm-over-openwsman-setup helped a lot. Of course you have to solve some other things such as DNS resolution AND trusting the self-issued certificates that the agent uses as well…
Running test queries:
This is how I tested what the discovery for a Linux RedHat Computer type (I read that by opening the MP in authoring console, as one would usually do) should be returning, run FROM the Windows box with WINRM:
winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_OperatingSystem?__cimnamespace=root/scx -username:root -password:password -r:https://centos:1270/wsman -auth:basic
If you need to test the query directly *ON* the linux box (querying the CIMD instead than WSMAND), the WBEMEXEC utility is packaged with the agent (under /opt/microsoft/scx/bin/tools ). It is not as easy as some windows administrators (that have used WBEMTEST or WMI Tools in the past) would hope, but not even that bad. Just to run a few queries to the CIM daemon locally it is not really interactive, so you need to create a XML file that looks like the following (basically you build the RAW request the way the CIMD accepts it):
<?xml version=”1.0″ ?>
<CIM CIMVERSION=”2.0″ DTDVERSION=”2.0″>
<MESSAGE ID=”50000″ PROTOCOLVERSION=”1.0″>
Once you have made such a file, you can execute the query in the file with the tool like the following:
As you can see from here, CIMD uses HTTP already. This differs from Windows’ WMI that uses RPC/DCOM. In a way, this is much simpler to troubleshoot, and more firewall-friendly.
I have not really found an activity or debug log for any of those components, yet… but in the end they are not doing anything ON THEIR OWN, unless asked by the MS…. So the “healthservice” logic is all on the MS anyway. Errors about failed discoveries, permissions of the Action Account user, and anything else will be logged by the HealthService on the Windows machine (the Management Server) that is actually performing monitoring towards the Unix box.
It really is *just* getting the WMI and WinRM-equivalent layer on linux/Unix up and running– after that, everything is done from windows anyway!
After this common management infrastructure has been provided, 3rd parties will be facilitated in writing *just* MPs, without having to worry about the TRANSPORT anymore.
As you have probably noticed from the screenshots and commandlines, I don’t have a “real” Redhat Enterprise or “supported” linux distribution… Therefore I started my testing using CentOS 5 (which is very similar to RHEL 5) – the agent installed fine as you can see, but I was not getting anything really “discovered” – the MP had only found a “linux computer” but was not finding any “RedHat” or “SuSe” or any other “Operating System” instances… and if you are somewhat familiar with the way Operations Manager targeting works, you would understand that monitors are targeted at object classes. If I don’t have any instance of those objects being dicovered, NO MONITORING actually happens, even if the infrastructure is in place:
Therefore my machine was not being monitored. This is the typical unsupported solution.
In the end, I actually even got it to work, but I had to create a new Management Pack (exporting and modifying the RHEL5 one as a base) that would actually search for different Property values and discover CentOS instead as if it were RedHat:
After importing my hacked Management Pack the machine started to be monitored. Here you can see Health Explorer in all of its glory:
Of course this is a hack I made just to have a test setup somewhat working and to familiarize myself with the SCX components. It is not guaranteed that my Management pack actually works on CentOS the way it is supposed to work and that there aren’t other – more subtle – differences between RedHat and CentOS that will make it fail. I only modified a couple of Discoveries to let it discover the “Operating System” instance… everything else should follow, but not necessarily. One difference you see already in the screenshot above is that I am not yet seeing the hardware being monitored, so my hack is already only partially working and it is definitely something that won’t be supported, so I cannot provide it here. Also, this is a beta, so I I think that the Managemnt Packs will be re-released with following beta versions, and this change is something that would need to be re-done all over again.
But I could not wait to see this working, while waiting two business days (we are on a weekend!) for confirmation that I am allowed to actually download a 30-day-Trial of the real RedHat Enteprise Linux.