July 10, 2018 Windows updates cause SQL startup issues due to "TCP port is already in use" errors

We have recently become aware of a regression in one of the TCP/IP functions that manages the TCP port pool which was introduced in the July 10, 2018 Windows updates for Windows 7/Server 2008 R2 and Windows 8.1/Server 2012 R2.

This regression may cause the restart of the SQL Server service to fail with the error, “TCP port is already in use”. We have also observed this issue preventing Availability Group listeners from coming online during failover events for both planned and/or unexpected failovers. When this occurs, you may observe errors similar to below in the SQL ERRORLOGs:

Error: 26023, Severity: 16, State: 1.
Server TCP provider failed to listen on [ <IP ADDRESS> <ipv4> <PORT>]. Tcp port is already in use.
Error: 17182, Severity: 16, State: 1.
TDSSNIClient initialization failed with error 0x2740, status code 0xa. Reason: Unable to initialize the TCP/IP listener. Only one usage of each socket address (protocol/network address/port) is normally permitted.
Error: 17182, Severity: 16, State: 1.
TDSSNIClient initialization failed with error 0x2740, status code 0x1. Reason: Initialization failed with an infrastructure error. Check for previous errors. Only one usage of each socket address (protocol/network address/port) is normally permitted.
Error: 17826, Severity: 18, State: 3.
Could not start the network library because of an internal error in the network library. To determine the cause, review the errors immediately preceding this one in the error log.
Error: 17120, Severity: 16, State: 1.
SQL Server could not spawn FRunCommunicationsManager thread. Check the SQL Server error log and the Windows event logs for information about possible related problems.

If the issue is impacting an Availability Group listener, you may also observe the below error in addition to the above:

Error: 26075, Severity: 16, State: 1.
Failed to start a listener for virtual network name '<LISTENER NAME>'. Error: 10048.

Additionally, you may also observe the following errors in the Windows System logs:

The SQL Server (<INSTANCE NAME>) service entered the stopped state.
The SQL Server (<INSTANCE NAME>) service terminated with the following service-specific error:  Only one usage of each socket address (protocol/network address/port) is normally permitted.

And if the instance is part of a cluster:

Cluster resource 'SQL Server (<INSTANCE NAME>)' of type 'SQL Server' in clustered role 'SQL Server (<INSTANCE NAME>)' failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
The Cluster service failed to bring clustered role 'SQL Server (<INSTANCE NAME>)' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

It is also possible for this issue to impact the creation of a new Availability Group listener. In such scenarios, you may encounter an error like below from SQL Server Management Studio:

The configuration changes to the availability group listener were completed, but the TCP provider of the instance of SQL Server failed to listen on the specified port [<LISTENER NAME>:<PORT>]. This TCP port is already in use. Reconfigure the availability group listener, specifying an available TCP port. For information about altering an availability group listener, see the “ALTER AVAILABILITY GROUP (Transact-SQL)” topic in SQL Server Books Online. (Microsoft SQL Server, Error: 19486)

For this scenario, you may see errors similar to below in the SQL ERRORLOGs:

Error: 19476, Severity: 16, State: 4.
The attempt to create the network name and IP address for the listener failed. If this is a WSFC availability group, the WSFC service may not be running or may be inaccessible in its current state, or the values provided for the network name and IP address may be incorrect. Check the state of the WSFC cluster and validate the network name and IP address with the network administrator. Otherwise, contact your primary support provider.
The Service Broker endpoint is in disabled or stopped state.
Error: 26023, Severity: 16, State: 1.
Server TCP provider failed to listen on [ <IP ADDRESS> <PORT>]. Tcp port is already in use.
Error: 26075, Severity: 16, State: 1.
Failed to start a listener for virtual network name ‘<LISTENER NAME>:’. Error: 10048.
Stopped listening on virtual network name ‘<LISTENER NAME>:’. No user action is required.
Error: 10800, Severity: 16, State: 1.
The listener for the WSFC resource ‘<RESOURCE GUID>’ failed to start, and returned error code 10048, ‘Only one usage of each socket address (protocol/network address/port) is normally permitted.‘. For more information about this error code, see “System Error Codes” in the Windows Development Documentation.
Error: 19452, Severity: 16, State: 1.
The availability group listener (network name) with Windows Server Failover Clustering resource ID ‘<RESOURCE GUID>’, DNS name ‘<LISTENER NAME>’, port <PORT> failed to start with a permanent error: 10048. Verify port numbers, DNS names and other related network configuration, then retry the operation.

 

Solution:

The Windows team has already released hotfixes to address this issue and we have had multiple customers already confirm that these hotfixes have resolved issues related to this regression. The below tables list the KB articles for the patches that introduced the regression and the KB articles for their correlating hotfixes.

 

For Windows 7/Server 2008 R2

KBs that introduced the regression

KBs that fix the regression

July 10, 2018—KB4338818 (Monthly Rollup)

July 18, 2018—KB4338821 (Preview of Monthly Rollup)

July 10, 2018—KB4338823 (Security-only update)

Improvements and fixes - Windows 7 Service Pack 1 and Windows Server 2008 R2 Service Pack 1 (KB4345459)

For Windows Server 2012

KBs that introduced the regression

KBs that fix the regression

July 10, 2018—KB4338830 (Monthly Rollup)

July 18, 2018—KB4338816 (Preview of Monthly Rollup)

July 10, 2018—KB4338820 (Security-only update)

Improvements and fixes - Windows Server 2012 (KB4345425)

For Windows 8.1/Server 2012 R2

KBs that introduced the regression

KBs that fix the regression

July 10, 2018—KB4338815 (Monthly Rollup)

July 18, 2018—KB4338831 (Preview of Monthly Rollup)

July 10, 2018—KB4338824 (Security-only update)

Improvements and fixes - Windows 8.1 and Server 2012 R2 (KB4345424)

 

You can choose to install either of the applicable KBs that fix the regression in order to resolve issues with SQL service/Availability Group listeners failing to start/come online due to "TCP port is already in use" errors due to this regression. For example, if your system has KB4338815, you can install either KB4338831 or KB4345424 to fix the regression. The difference between the two is that KB4345424 provides only the fix for the regression, whereas KB4338831 includes all of the fixes from KB4338815 as well as some additional quality improvements as a preview of the next Monthly Rollup update (which includes the fix for the regression).

In addition to the monthly rollup/security-only updates mentioned above, this regression was also introduced in updates for specific Windows 10/Server 2016 builds. Please note that the build-specific updates do not have a correlating hotfix-only patch, therefore each build only has one applicable patch to address the regression as noted in the table below.

 

KB that introduced the regression

KB that fixes the regression

July 10, 2018—KB4338819 (OS Build 17134.165)

July 16, 2018—KB4345421 (OS Build 17134.167)

July 10, 2018—KB4338825 (OS Build 16299.547)

July 16, 2018—KB4345420 (OS Build 16299.551)

July 10, 2018—KB4338826 (OS Build 15063.1206)

July 16, 2018—KB4345419 (OS Build 15063.1209)

July 10, 2018—KB4338814 (OS Build 14393.2363)

July 16, 2018—KB4345418 (OS Build 14393.2368)

July 10, 2018—KB4338829 (OS Build 10240.17914)

July 16, 2018—KB4345455 (OS Build 10240.17918)

 

There can be other causes of the "TCP port is already in use" errors preventing SQL resources from starting/coming online which are not due to the regression mentioned above. If you are encountering similar errors but do not have the July 10, 2018 updates installed on your system, or you already have the fix installed, then you may find our colleague Chris Thompson's blog - https://blogs.msdn.microsoft.com/sql_pfe_blog/2016/10/05/tcp-port-is-already-in-use/ - useful in identifying whether any other process(es) may be using the port meant for your SQL instance(s).