SMB 2.x and SMB 3.0 Timeouts in Windows

This blog talks about common timeouts for SMB dialects 2.x and 3.0 [MS-SMB2] in Windows. It also covers continuous availability timeout, witness keep alive [MS-SWN], and some SMB-Direct timers [MS-SMBD]. The behaviors are generally version-specific and therefore may change in future Windows releases or fixes.

A previous blog discusses “CIFS and SMB Timeouts in Windows”:
https://blogs.msdn.com/b/openspecification/archive/2013/03/19/cifs-and-smb-timeouts-in-windows.aspx
 
NOTE: For questions on MS-SMB2, MS-SWN, MS-SMBD documents, please post in the Open Specifications Forum: Windows Protocols at https://social.msdn.microsoft.com/Forums/en-US/os_windowsprotocols.

Given a SMB2 file sharing scenario, these are frequent troubleshooting questions on timeouts:
- What timeouts are involved?
- What are the related Windows behaviors?
- What timers are configurable and what are their settings in Windows?
Just as a refresher, the following are the Windows SKUs where SMB dialects 2.x and 3.0 were introduced.
Dialect 2.002, Windows Vista and Windows Server 2008.
Dialect 2.1, Windows 7 and Windows Server 2008 R2.
Dialect 3.0, Windows 8 and Windows Server 2012.
All these SMB 2.x and 3.0 dialects share the same core SMB2 Packet format [MS-SMB2].

Request Expiration Timer [MS-SMB2]

This is the amount of time the client waits for the server to respond to an outstanding request. This timeout value can be adjusted through the following registry setting:
\HKLM\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters\
Value type: Dword
Value name: SessTimeout
Default: 60 seconds (Windows Vista)

When the client does not receive the response to a request before the Request Expiration Timer expires, it will reset the connection because the operation is considered blocked.

NOTE: The client may choose this timeout based on local policy, the type of request, and network characteristics. One such example is the implementation choice introduced in Windows 8 for the SMB2 Negotiate when continuously availability (CA) is active. When negotiating with CA cluster servers, the Negotiate request timeout is set to a smaller value, e.g. the maximum of 10 seconds and SessTimeout/6. This is to allow a fast failover, so that when a CA server is not responding, the SMB 3.0 client can expedite failover to the other node. Recall that CA requires SMB 3.0 and onward.

If a request is being processed asynchronously, i.e. the server sends an interim response with Status set to STATUS_PENDING and SMB2_FLAGS_ASYNC_COMMAND bit set in Flags, Windows clients extend the time-out as follows:

• If the asynchronous operation is SMB2 Directory Change Notification, the client will not enforce a timeout. 
• Otherwise, if the client is running at least Windows 7 and ExtendedSessTimeout is configured, the timeout is extended to the value of ExtendedSessTimeout:
\HKLM\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters\
Value type: Dword
Value name: ExtendedSessTimeout
• Otherwise, if the client is running at least Windows 7 and ExtendedSessTimeout is not configured, the timeout is extended to four times the value of SessTimeout (4 * SessTimeout). By default, ExtendedSessTimeout is not configured.

For example, it is typical that an asynchronous write operation expires if a backend Windows 2008 R2-based storage server is taking over 4 minutes (4 * 60 sec default SessTimeout plus the scanning time to detect that the request expired) to complete the operation. Increasing SessTimeout would effectively extend the time for asynchronous operations.

The client does not enforce this timer for the following commands:
• Named Pipe Read
• Named Pipe Write
• Asynchronous Directory Change Notifications
• Blocking byte range lock requests
• FSCTLs: FSCTL_PIPE_PEEK, FSCTL_PIPE_TRANSCEIVE, FSCTL_PIPE_WAIT

Note that SessTimeout and ExtendedSessTimeout also apply to Windows-based CIFS/SMB as well, see previous blog. However, the use of ExtendedSessTimeout in SMB is controlled by client configuration of ServersWithExtendedSessTimeout rather a server response.

Session Expiration Timer [MS-SMB2]

This timer is used as a frequency to scan and mark sessions as expired when their specific expiration time is reached. This timer value is 45 seconds in Windows-based servers.
If a session is in expired state and a request is received, the server should return STATUS_NETWORK_SESSION_EXPIRED and the client must re-authenticate. However, while a session is in expired state, the server processes requests in the following cases:
- LOGOFF, CLOSE, and LOCK (unlock) which would allow to gracefully teardown.
- SESSION_SETUP for re-authentication.
- Windows releases prior to Windows 8 do not fail a signed request, i.e. the SMB2 header has SMB2_FLAGS_SIGNED set in the Flags field, and the request is not an SMB2 LOCK.
Authentication-specific expiration is driven by the authentication package. See previous blog on “CIFS and SMB Timeouts in Windows” for more details. Session.ExpirationTime is set to the value returned by SSPI AcceptSecurityContext.
Note that for a given connection object, if the SessionTable remains empty between two cycles of session expiration timer, Windows-based servers will scavenge and disconnect the connection.

Resilient Open Scavenger Timer [MS-SMB2]
This feature was introduced with SMB 2.1 in Windows 7.
This timer is started when the transport connection associated with a resilient handle is lost. It controls the amount of time the server keeps a resilient handle active after the transport connection to the client is lost.
A resilient handle/open is meant to survive temporary transport network disruption. If the client re-establishes connection in a reasonable time after the connection was lost, the client can reconnect to the handle. A client marks a handle resilient via SMB2 IOCTL with CtlCode FSCTL_LMR_REQUEST_RESILIENCY. Note that Windows does not check the negotiated dialect when processing this FSCTL.
The Open.ResiliencyTimeout is set as follows:
- Either a non-zero value is supplied in the Timeout field of the NETWORK_RESILIENCY_REQUEST request. If the requested timeout is greater than MaxResiliencyTimeout, the server returns STATUS_INVALID_PARAMETER.
- Otherwise, an implementation-specific value is used for resiliency timeout. Windows 7 and Windows Server 2008 R2 servers keep the resilient handle open indefinitely when the Timeout value (requested in NETWORK_RESILIENCY_REQUEST) is equal to zero. Windows 8 and Windows Server 2012 set a default value of 120 seconds.
The MaxResiliencyTimeout value can be configured through:
\HKLM\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters\
Value type: Dword
Value name: ResilientTimeout
Default: 300 seconds (Windows 7, Server 2008 R2, 8, Server 2012)

Durable Open Scavenger Timer [MS-SMB2]
This feature was introduced with SMB 2.1 in Windows 7.
This timer is started when the transport connection associated with a durable handle is lost. It controls the amount of time the server keeps a durable handle active after the transport connection to the client is lost.
A durable handle/open allows the client to attempt to preserve and reestablish a file handle after a network disconnection. A client requests an open to be durable through one of the create contexts SMB2_CREATE_DURABLE_HANDLE_REQUEST or SMB2_CREATE_DURABLE_HANDLE_REQUEST_V2 (v2 requires SMB 3.0 dialect).
The durability timeout is set as follows:
- For SMB2_CREATE_DURABLE_HANDLE_REQUEST, Windows 7 and Windows 2008 R2 set this timeout to 16 minutes, Windows 8 and Windows Server 2012 set the value to 2 minutes. 
- For SMB2_CREATE_DURABLE_HANDLE_REQUEST_V2, the timeout is set in the following order:
  a) A non-zero value is supplied in the Timeout field of the durable v2 create context request. The Timeout in the response is set to the minimum between the durable-v2-create-context requested timeout and an implementation-specific maximum value <Windows 8.1 and Server 2012 R2 set this maximum to 300 seconds. Windows 8 and Server 2012 set this value to the Timeout of the request>. 
  b) A non-zero value is configured on the share’s CATimeout property.
  c) the server’s implementation specific value; Windows-based servers use the value of the registry setting:
     \HKLM\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters\
     Value type: Dword
     Value name: DurableHandleV2TimeoutInSecond
     Default: 60 seconds (Windows 8, Windows Server 2012)
Default: 180 seconds (Windows 8.1, Windows Server 2012 R2)
     Maximum: 300 seconds
              
Continuous Availability Timeout
This feature was introduced with SMB 3.0 in Windows 8.
With SMB 3.0, each share has a CATimeout property which defines the minimum time the server should hold a persistent handle on a continuously available share before closing the handle if it is un-reclaimed. By default, Windows 8 and Windows Server 2012 set CATimeout to zero.
CATimeout can be set or retrieved using PowerShell command Set-SmbShare or Get-SmbShare.
Each share’s CATimeout needs to be configured to a sensible value to enable the SMB 3 client to perform transparent file handle recovery during server failovers.
In the event of server failover, the persistent handle may have timed out before the client reconnects to the clustered server and attempts to reclaim the handle. If that occurs, the client may replay an outstanding Read, Write, or IOCTL operation by using a stale handle which no longer exists on the server side. 
Ideally, if a persistent handle times out, the client should abandon the outstanding operation and return an error to the application.

Witness keep-alive interval [MS-SWN]
This functionality was introduced for SMB 3.0 in Windows 8.
The witness protocol is used to explicitly notify a client of resource changes that have occurred on a highly available cluster server. This enables faster recovery from unplanned failures, so that the client does not need to wait for TCP timeouts.
The server advertises the support of witness protocol monitoring through the SMB2 TREE_CONNECT response capability flag SMB2_SHARE_CAP_CLUSTER. The client instructs its witness client to register for asynchronous notifications for desired resources on the cluster node it is not connected to. The witness (server) service listens and reports cluster events related to the clustered file server that the client is connected to.
When the client registers (i.e. WintnessrRegister), the server assigns a registration key – a unique UID – that is used for subsequent requests on that context handle. A normal client shutdown (e.g. LanmanWorkstation) would trigger WintnessrUnregister and clear the associated state information on both sides.
However, if the client crashes or gets disconnected, the witness service gets notified by RPC runtime for the disconnection. The witness service uses a default RPC keep-alive interval that can be configured via the following registry setting:
\HKLM\SYSTEM\CurrentControlSet\Services\SMBWitness\Parameters\
Value type: Dword
Value name: KeepAliveInterval
Default: 20 minutes (Windows 8, Windows Server 2012)
Upon receipt of disconnection notification, the witness service will implicitly unregister the client.
When the client comes back online after it crashed, it will register again since it has lost its state information.
If the client simply lost the connection, and reconnected before the server noticed, the client cancels any outstanding WitnessrAsyncNotify just in case RPC runtime is still holding its state and then re-issues a new RPC call.

SMB-Direct timers [MS-SMBD]

SMB-Direct is a new transport supported in Windows 8. It is designed to carry SMB2 over Remote Direct Memory Access (RDMA) Transport Protocol.

Negotiation Timer
This timer is per-connection. It controls the amount of time to:
- Establish a connection and complete negotiation. ConnectTimeoutInMs is the deadline for the remote peer to accept the connection request and complete SMB Direct negotiation.
- Accept negotiation: The SMB Direct Negotiate request should be received before AcceptTimeoutInMs expires. The servers starts this timer as soon as it accepted the connection.
\HKLM\System\CurrentControlSet\Services\SmbDirect\Parameters
Value type: Dword
Value name: ConnectTimeoutInMs
Default: 120 seconds (Windows 8)
Value type: Dword
Value name: AcceptTimeoutInMs
Default: 5 seconds (Windows 8)

Idle Connection Timer
This timer is per-connection. It is the amount of time the connection can be idle without receiving a message from the remote peer. Before the local peer terminates the connection, it sends a keep alive request to the remote peer and applies a keep alive timer.
\HKLM\System\CurrentControlSet\Services\SmbDirect\Parameters
Value type: Dword
Value name: IdleConnectionTimeoutInMs
Default: 120 seconds (Windows 8)

Keep alive interval
This attribute is per-connection. It defines the timeout to wait for the peer response for a keep-alive message on an idle RDMA connection.
\HKLM\System\CurrentControlSet\Services\SmbDirect\Parameters
Value type: Dword
Value name: KeepaliveResponseTimeoutInMs
Default: 5 seconds (Windows 8)

Send Credit Grant Timer
This timer is per-connection. It regulates the amount of time that the local peer waits for the remote peer to grant Send credits before disconnecting the connection. This timer is started when the local peer runs out of Send credits.
\HKLM\System\CurrentControlSet\Services\SmbDirect\Parameters
Value type: Dword
Value name: CreditGrantTimeoutInMs
Default: 5 seconds (Windows 8)

If any of these SMB-Direct timers expires, the local peer terminates the connection, then signals the connection loss to the RDMA provider.

References
[MS-SMB2]: Server Message Block (SMB) Protocol Versions 2 and 3
https://msdn.microsoft.com/en-us/library/cc246482.aspx
[MS-SWN]: Service Witness Protocol
https://msdn.microsoft.com/en-us/library/hh536748.aspx
[MS-SMBD]: SMB2 Remote Direct Memory Access (RDMA) Transport Protocol
https://msdn.microsoft.com/en-us/library/hh536346.aspx