Problems with Live Migration

Hi,

today my collegue Bernd Högen has interesting finding that he would like to share:

Recently, I faced the following Issue: On a W2k8 R2 SP1 Hyper-V Host Cluster with 2 CSV´s only the Coordinator Node was able to access the CSV. The other Node(s) did not see the CSV Volume below c:\clusterstorage

The CSV Volumes failed over fine between the Cluster Nodes.

 

During the Reboot of the Cluster Node there were several Errors in the Eventlog that a DC was unavailable.

--------

05/31/2011 07:10:54 PM  Error         5719    NETLOGON                         N/A             N/A                                This computer was not able to set up a secure session with a domain controller in domain due to the following:   There are currently no logon servers available to service the logon request.   This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator.      ADDITIONAL INFO   If this computer is a domain controller for the specified domain, it sets up the secure session to the primary domain controller emulator in the specified domain. Otherwise, this computer sets up the secure session to any domain controller in the specified domain.

 

05/31/2011 07:11:28 PM  Error         5142    Microsoft-Windows-FailoverCluste Cluster Shared  NT AUTHORITY\SYSTEM                Cluster Shared Volume CSV1 is no longer accessible from this cluster node because of error 'ERROR_CANT_ACCESS_DOMAIN_INFO(1351)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.

 

Since no DC was available on Cluster Node Startup the Cluster Network Name could not be authenticated.

 

000004f0.00000ea4::2011/05/31-06:36:39.951 ERR   [RES] Network Name <Cluster Name>: Unable to find the active directory domain.  Sc: 1355.

000004f0.00000ea4::2011/05/31-06:36:39.951 ERR   [RES] Network Name <Cluster Name>: Can't get AD domain.  Error 1355.

000004f0.00000ea4::2011/05/31-06:36:39.951 INFO  [RES] Network Name <Cluster Name>: Getting a virtual computer account token.

000004f0.00000ea4::2011/05/31-06:36:39.951 INFO  [RES] Network Name <Cluster Name>: Resource object did not contain the cached AD Domain.  Obtaining.

000004f0.00000ea4::2011/05/31-06:36:39.951 ERR   [RES] Network Name <Cluster Name>: Unable to find the active directory domain.  Sc: 1355.

000004f0.00000ea4::2011/05/31-06:36:39.951 ERR   [RHS] Error 1355 from ResourceControl for resource Cluster Name.

0000082c.000009a4::2011/05/31-06:36:39.951 WARN  [RCM] ResourceControl(NETNAME_GET_VIRTUAL_SERVER_TOKEN) to Cluster Name returned 1355.

 

0000082c.00001010::2011/05/31-06:36:39.951 ERR   [DCM] Security was not initalized, cannot map disk CSV1

 

To successfully mount a CSV you have to authenticate with the CNO (Cluster Name Object / Cluster Network Name) on the CSV Share from the Coordinator Node. Since the Authentication of the CNO failed it was not possible to authenticate with the CNO on the CSV Share from the Coordinator Node. Apparently the Cluster Service did not try again to authenticate on the Coordinator Node when the Network was fully initialized.

 

We changed the Startup Parameter from the Cluster Service to Automatic Delayed which fixed the Problem. This can be done via the following command

Sc config clussvc start= delayed-auto

Do this on all nodes when you experience this problem.

Thanks Bernd!

Cheers

Robert