Flow Control in Always On Availability Groups – what does it mean and how do I monitor it?


A question was posed to me whether Flow Control which
existed in Mirroring was still relevant for Availability groups.

Flow Control is primarily a mechanism to gate or throttle
messages to avoid use of excessive resource on the primary or secondary.
When we are in “Flow Control” mode, sending of log block messages from the Primary to
the Secondary is paused until out of flow control mode.

 

A Flow Control gate or threshold exists at 2 places:

– Availability Group Replica/Transport – 8192
Messages

– Availability Group Replica Database. – 112*16 =
1792 Messages per database subject to the 8192 total limit at the transport or
Replica level

 

When a log block is captured, every message sent over the
wire has a Sequence Number which is a monotonically increasing number.
Each packet also includes an acknowledgement number which is the sequence
number of the last message received /processed at the other side of the
connection. With these two numbers, the number of outstanding messages can be
calculated to see if there exists a large number unprocessed messages. Message
sequence number is also introduced in order to ensure that messages are sent in
sequence. If the messages are out of sequence then the session is torn down and
re-established.

 

From an Availability Replica perspective, either the Primary
or the Secondary replica can signal that we are in Flow control mode.

 

On the Primary, when we send a message, we check for the
number of UN-acknowledged messages that we have sent – which is the delta
between Sequence Number of the message sent and last acknowledged message. If
that delta exceeds a pre-defined threshold value, that replica or database is
in flow control mode which means that no further messages are sent to the
secondary until the flow control mode is reset. This gives the secondary some
time to process and acknowledge the messages and allows whatever resource
pressure that exists on the secondary to clear up.

 

On the Secondary, when we reach a low threshold of Log
caches or when we detect memory pressure, the secondary passes a message to the
primary indicating it is low on resources. When SECONDARY_FLOW_CONTROL message
is sent to the primary, a bit is set on the primary layer for the database in
question indicating it is in Flow control mode. This in turn skips this
database when doing a round-robin scan of databases to send data.

 

Once we are in “flow control” mode, until that is reset, we
do not send messages to the primary. Instead, we check every 1000ms for a
change in flow control state. On the secondary for example, if the log caches are flushed
and additional buffers are available,
the secondary will send a flow control disable message indicating we no longer
need to be flow controlled. Once the primary gets this message, that bit is
cleared out and messages again will flow from the database in question. On the
Transport or Replica side on the other hand, once the number of unacknowledged messages falls
below the gated threshold, it is reset as well.

While we are in Flow control mode, perfmon counters and wait
types can give us the amount of time we are in flow control mode.

 

Wait Types:

http://msdn.microsoft.com/en-us/library/ms179984.aspx

 

HADR_DATABASE_FLOW_CONTROL

Waiting for messages to be sent to
the partner when the maximum number of queued messages has been reached.
Indicates that the log scans are running faster than the network sends. This
is an issue only if network sends are slower than expected.

HADR_TRANSPORT_FLOW_CONTROL

Waiting when the number of
outstanding unacknowledged AlwaysOn messages is over the out flow control
threshold. This is on an availability replica-to-replica basis (not on a
database-to-database basis).

 

Perfmon counters:

http://msdn.microsoft.com/en-us/library/ff878472.aspx

 

Flow
Control Time (ms/sec)

Time in milliseconds that log stream
messages waited for send flow control, in the last second.

Flow
Control/sec

Number of times flow-control
initiated in the last second. Flow Control Time (ms/sec) divided by Flow
Control/sec is the average time per wait.

 

Extended Events

There are 2 Extended Events which will give us the relevant
information when we are under the Flow control mode – note they are under the
Debug Channel.

The action is basically a “set=0” or “cleared=1” bit.

clip_image002

 

Denzil Ribeiro – Sr Premier Field Engineer


 

Comments (0)