This is part of a multi-part article (if you didn't gather that from the title). For easy reference, here are all the parts.
Scenario #2 - "SSH discovery failed." with "unspecified problem"
After fixing the issue with not being able to read the agents directory, the discovery details in the UI shows this:
Obviously that's not very helpful, so I go back to the DebugView output. Starting from the end and going up, I'm looking for something that might give me a clue as to why the task failed. Unfortunately, I don't see anything definitive, but I'm narrowing down what might be the area of the problem. Here are the last few lines in the DebugView output:
 Beginning ExecuteOSInformationScript on thread id: 7  7| Executing DiscoveryScript.Discovery.Task  7| Returned from DiscoveryScript.Discovery.Task  7| DiscoveryScript.Discovery.Task returned as succeeded  7| Return from DiscoveryTaskHelper.ExecuteOSInformationScript()  9218a54c-5961-4fc0-bd46-99fbe8d543b6 | 7 | Return from DiscoveryTaskHelper.ExecuteSSHDiscovery()  Microsoft.MOM.UI.Console.exe Error: 0 :  9218a54c-5961-4fc0-bd46-99fbe8d543b6 | 7 | 10.10.10.17: ExecuteSSHDiscovery failed with exception: <stdout></stdout><stderr></stderr><exception>Unspecified problem</exception>  Beginning OnDiscoveryTaskCompleted on thread id: 1  Beginning startNextDiscoveryTask on thread id: 1
Comparing this to the flowchart above, it looks like it is at the "Copy GetOSVersion.sh script" step in the process, and it's copying the script to the computer and running it, but there is an error at that point. Using WinSCP, another tool in my kit, I browse the file system of the remote Linux computer and see that /tmp/scx-root/GetOSVersion.sh is indeed there, so the file copy worked. I check the rights on the file and I see it's set to "-rw-r—r—" but that's ok since the MP uses the "sh" command to run the script so it doesn't need execute permissions.
Going to the module debug logs (enabled with EnableOpsMgrModuleLogging) I see the DeployFile.vbs.log file was updated recently. Here's what it said:
Transferring file: C:\Program Files\System Center Operations Manager 2007\AgentManagement\UnixAgents\GetOSVersion.sh to location: /tmp/scx-root/ Verifying that file: GetOSVersion.sh was transferred properly /tmp/scx-root/GetOSVersion.sh
So that looks ok. Let's check the SSHCommandProbe.log file:
Leave SSHCommandProbe::DoProcess Enter SSHCommandProbe::EnterDoInit XML_INIT_CALL GetEventId Exit SSHCommandProbe::DoInit Enter SSHCommandProbe::DoProcess (YES!!!) SSHCommandProbe::DoProcess passed initial arguments checking SSHCommandProbe::DoProcess preparing SSH call centos55-x86 22 root sh /tmp/scx-$USER/GetOSVersion.sh; EC=$?; rm -rf /tmp/scx-$USER; exit $EC Enter SSHFacade::RunCommand ExpectedSSHFacadeException Unspecified problem Enter initDataHolder Enter initDataType initDataType initializing output datatype Leave initDataType Leave initDataHolder Leave SSHCommandProbe::DoProcess
That "Unspecified problem" looks suspicious, but it's not getting me any closer to figuring out the root cause here…
I'll try deleting it and re-running discovery. I watch DebugView as well as watching the directory with WinSCP, and it seems to take a long time to get the file there (it should be really quick since I'm on a private LAN). I think what is happening is a timeout (like my previous article: Can't get your computer discovered?). The Microsoft.Unix.DiscoveryScript.Discovery.Task has a timeout value of 20 seconds, so it's possible the network configuration is causing it to time out.
I go disable my secondary network adapter on my OpsMgr server so only my private LAN is active and I re-run discovery. Still fails. Hmm. I go check my network configuration on the CentOS machine and it looks like my DNS got configured to my corp network via DCHP on the other network adapter. I reconfigure the adapter on the private network to use the proper DNS, and…it looks like my problem is solved! But now I have a new problem…
More on that in part 3.