We need to talk (aka: “It’s not me, it’s you”) ...

Issue:

After installing an Administrative node of FAST Search Server 2010, additional Non-Administrative FAST Search Server 2010 nodes cannot communicate with the Admin node. Subsequently, the non-admin nodes to fail download the deployment.xml, and as such, fail to finish configuration successfully.

What you can expect to see:

When running the psconfig.ps1 script on the non-admin nodes via PowerShell, you may see a failure similar to the following:

Please wait while Windows configures FAST Search Server for SharePoint. Configuration may take several minutes...

An error occurred while configuring IPSec - Could not connect to the admin node.

 This may be because of,

        1. Invalid admin node name

        2. Invalid baseport. Baseport of admin node and non-admin node must be same

        3. Admin node is not up and running

        4. Missing IPSec rules on admin node. If you added this host to the deployment.xml after running this script on the admin node, you need to rerun the IPSec cmdlet on the admin node

See C:\Install\FAST\log\fast-install.log for more details.

 

This is also evident if you simply attempt to run the Set-FASTSearchIPSec PowerShell cmdlet:

2010-07-15 04:03:19.170Z Verbose [15] SetIPSec - no.fast.middleware.SystemException: Unable to connect to the remote server ---> System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 157.54.158.87:13390

   at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)

   at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Int32 timeout, Exception& exception)

   --- End of inner exception stack trace ---

   at System.Net.HttpWebRequest.GetRequestStream()

   at no.fast.middleware.ConnectionManager.ExecuteMethod(HttpWebRequest request, MemoryStream dataStream)

   at no.fast.dsinterfaces.nameservice.NameserverStub.Resolve(String Name, String InterfaceType, String Version)

   --- End of inner exception stack trace ---

   at Microsoft.SharePoint.Search.Extended.Installer.Mahasen.Common.Middleware.MiddlewareOperations.DownloadDeploymentConfig(String localhost, String nameserverhost, String middleware_port, String fileName)

   at Microsoft.SharePoint.Search.Extended.Installer.Mahasen.Cmdlets.SetIPSecCommand.CreateNonAdminRules(IHostConfig hostConfig)

   at Microsoft.SharePoint.Search.Extended.Installer.Mahasen.Cmdlets.SetIPSecCommand.ProcessRecord()

Root Cause (Summary):

The non-admin node(s) attempt to communicate to the admin node via port 13390, which is secured via IPSEC. If the Kerberos session encounters a failure, the IPSEC connection is never established, and the socket will not be opened.

How to Troubleshoot:

1. In these scenarios, we use the “Microsoft IPSEC Diagnostic Tool” (from https://support.microsoft.com/kb/943862) to gather additional diagnostic data specific to the IPSEC failures.

2. Once the data has been collected, open the wfpdiag.txt file, search for “ERROR_IPSEC_IKE_AUTH_FAIL,” and then you should see data similar to the following:

[2]01C0.15D4::07/15/2010-16:38:22.626 [user] |146.27.92.41|Peer failed with Windows error 13801(ERROR_IPSEC_IKE_AUTH_FAIL)

[2]01C0.15D4::07/15/2010-16:38:22.626 [ikeext] 1137|146.27.92.41|ProcessNotifyData: mmSa 0000000002F14600 cookie 71ddc30c state 2 messId 0

[3]01C0.15D4::07/15/2010-16:38:22.643 [ikeext] 1137|146.27.92.41|IKE diagnostic event:

Event Header:

  Timestamp: 1601-01-01T00:00:00.000Z

  Flags: 0x00000106

    Local address field set

    Remote address field set

    IP version field set

  IP version: IPv4

  IP protocol: 0

  Local address: 146.27.92.39

  Remote address: 146.27.92.41

  Local Port: 0

  Remote Port: 0

  Application ID:

  User SID: <invalid>

Failure type: IKE/Authip Main Mode Failure

Type specific info:

  Failure error code:0x000035e9

    IKE authentication credentials are unacceptable

  Failure point: Remote

  Flags: 0x00000002

    Multiple MM failures present

  Keying module type: Authip

  MM State: Second roundtrip (SSPI) packet sent

  MM SA role: Responder

  MM auth method: Unknown

  Cert hash:

0000000000000000000000000000000000000000

  MM ID: 0x0000000000000471

  MM Filter ID: 0x000000000004bfc3

[3]01C0.15D4::07/15/2010-16:38:22.644 [ikeext] 1137|146.27.92.41|IKE diagnostic event:

Event Header:

  Timestamp: 1601-01-01T00:00:00.000Z

  Flags: 0x00000106

    Local address field set

    Remote address field set

    IP version field set

  IP version: IPv4

  IP protocol: 0

  Local address: 146.27.92.39

  Remote address: 146.27.92.41

  Local Port: 0

  Remote Port: 0

  Application ID:

  User SID: <invalid>

Failure type: IKE/Authip Main Mode Failure

Type specific info:

  Failure error code:0x000035ec

    General processing error

  Failure point: Local

  Flags: 0x00000002

    Multiple MM failures present

  Keying module type: Authip

  MM State: Second roundtrip (SSPI) packet sent

  MM SA role: Responder

  MM auth method: Kerberos

  Cert hash:

0000000000000000000000000000000000000000

  MM ID: 0x0000000000000471

  MM Filter ID: 0x000000000004bfc3

[2]01C0.15D4::07/15/2010-16:38:22.644 [ikeext] 1137|146.27.92.41|Cleaning up mmSa: 0000000002F14600. Error 13801(ERROR_IPSEC_IKE_AUTH_FAIL)

 

Root Cause (Details):

After reviewing the data above, we can clearly see a breakdown in the IPSEC conversation. This is primarily caused by one (or both) of the following tweaks:

 

· A Group Policy setting that forces the Kerberos conversation to take place entirely over UDP (“MaxPacketSize”)

· A Group Policy setting that restricts the Maximum Token Size (“MaxTokenSize”)

 

To verify this, open an Administrative Command Prompt, and run “GPRESULT -V” (no quotes), and you should see an entry similar to the following:

GPO: Default Domain Policy

    KeyName: SYSTEM\CurrentControlSet\Control\Lsa\Kerberos\Parameters\MaxTokenSize

    Value: 160, 134, 1, 0

    State: Enabled

 

This indicates that the Default Domain Policy is modifying the default settings for the MaxTokenSize, and subsequently is interfering with our ability to successfully establish a connection via IPSEC.

Resolution:

You may pursue the following options to resolve this issue:

 

· Exclude the FAST Search Server 2010 machines from having this setting applied to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa\Kerberos\Parameters\MaxTokenSize

 

· The FAST Server Administrators may potentially override the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa\Kerberos\Parameters\MaxTokenSize setting manually in the Registry (to a value of 0x0000FFFF), and subsequently lock it down such that future Group Policy applications do not overwrite this setting

 

Please refer to https://support.microsoft.com/kb/327825, “New resolution for problems with Kerberos authentication when users belong to many groups” for further information regarding the MaxTokenSize registry value.