DReplay Message: “Active connections exceed 8192, connection 8409 is waiting.”


This message was an interesting dive into the DReplay, session boundary logic that I thought I would share. 

Internally DReplay maintains a progressive, session queue.  This queue is limited to 8192 entries and populated in connection replay order based on the connect/disconnect boundaries.   A background worker maintains the queue for the replay workers, adding new sessions and cleaning up completed sessions.

DReplay is designed to allow 8192 concurrent sessions to replay.  During the capture, this means you must have 8192 or fewer entries in sys.dm_exec_sessions.  Exceeding the limit can result in the message and the wait state.

If you actually have 8192+ sessions that require synchronization with the 8193rd, 8194th, … session(s) the replay can stall because the 8193rd, 8194th, … session won’t have a replay worker until one of the 8192, previous sessions completes.   This does not mean the replay will be stuck forever.   Actions such as query completion, session completion, query timeouts, kill scripts and other such options can be used to achieve forward progress.

Why Do I Get This Message With Only 100 Concurrent Sessions Doing Make/Break?

You can encounter the message with fewer than 8192, concurrent sessions.   The logic is to prepare sessions to be executed (read ahead if you will.)   If, as a group, the current 8192 sessions take longer to replay than it takes to prepare sessions, the background worker will reach the limit and sleep until a session slot is available.

Here is an example where the background worker reaches the 8192 limit and waits.  3 sessions complete replay activities and the background worker prepares 3 new sessions and again reaches the 8192 limit.   The messages are showing forward progress and that the “read ahead” limit has been reached.   The background worker limits the queue size and waits to avoid encountering potential memory, resource limitations.


2013-07-05 19:00:16:255 CRITICAL     [Client Replay]       Active connections exceed 8192, connection 8467 is waiting.
2013-07-05 19:00:16:255 INFORMATION  [Client Replay]       All events for spid=298 have been replayed
2013-07-05 19:00:16:255 INFORMATION  [Client Replay]       All events for spid=362 have been replayed
2013-07-05 19:00:16:255 INFORMATION  [Client Replay]       All events for spid=293 have been replayed
2013-07-05 19:00:16:255 CRITICAL     [Client Replay]       Active connections exceed 8192, connection 8470 is waiting

I would also point out that the ‘connection #### is waiting’ is a nice progress indicator.   The DReplay log previously shows the number of dispatched connections.  2013-07-05 18:59:47:956 OPERATIONAL  [Client Replay]       35212 events are dispatched in 8800 connections.  From the messages above you can see DReplay has prepared sessions up to 8469 of the 8800 total to be replayed.

One reproduction of the message was the 100 concurrent make/break connections, each repeating 160 times for a total of 16000 total sessions.   DReplay sees these as 16000 unique sessions and sequences them accordingly.   In doing this DReplay will queue 8192 sessions wait for sessions to complete, add a few more and repeat the logic.   The message in this case is simply showing you have more then 8192 connect/disconnect boundaries (unique sessions) and DReplay has reached the prepared depth limit.

Bob Dorr – Principal SQL Server Escalation Engineer      

Comments (5)

  1. Aleksey Fomchenko says:

    Hi there,
    Do you have any idea why a replay rate using DReplay in ‘synchronization’ mode is much slower than in production? Sometimes it looks like the process stuck at all and doing nothing.
    I am using 12 Clients and DO NOT see those messages like you specified about ‘Active connections exceed 8192, connection 8470 is waiting’. No issues or some noticeable troubles in all Clients and Controller logs. There are not explicit bottlenecks. I am using AWS environment.

    Appreciate all your thoughts about it.

    My replay config is pretty normal:

    synchronization

    60
    3600

    Yes
    SPID

    Yes
    No

    Cheers

    1. Aleksey Fomchenko says:

      Sorry, XML has been heavily formatted by the forum.
      Here is it one more time:

      synchronization

      60
      3600

      Yes
      SPID

      Yes
      No

    2. Aleksey Fomchenko says:

      SequencingMode>synchronization603600YesSPIDYesNo<

    3. Aleksey Fomchenko says:

      Bloody xml…
      One more time (SETTINGS):

      SequencingMode – synchronization
      ConnectTimeScale – empty
      ThinkTimeScale – empty
      HealthmonInterval – 60
      QueryTimeout – 3600
      ThreadsPerClient – empty
      EnableConnectionPooling – Yes
      StressScaleGranularity – SPID

      RecordRowCount -Yes
      RecordResultSet – No

      Thank you for any assistance.

    4. Aleksey Fomchenko says:

      Sorry guys, the issue is about ‘Active connections exceed 8192’.
      Unfortunately, in some reason this information appeared in output logs only after I forced to stop the replay services.

      But anyway, it is quite unusual if you wish to replay a bigger volume of workload. For example I am replaying 24 hours from the production. The trace file is about 10GB. And I do not have so many concurrent connections for sure.

      Feel free to remove my previous comments.