Log Reader Agent Fail on Always On availability Groups configured on Multi-subnet

Recently I was working with one of Microsoft Partner on following issue wherein customer was migrating to SQL SERVER 2016 and during testing they have found Log Reader Agent was failing. I would like to share my learning from this issue and let me know if it is helpful or if you have any question.

Consider following scenario:

  • You have Always On availability Groups configured on Multi-subnet.
  • You have AlwaysOn Listener configured and running with RegisterAllProvidersIP 1 to support Modern application which use MultiSubnetFailover in connection string.
  • You have Transactional replication configured with remote distributor.
  • You have redirected Original Publisher to the AG Listener Name using “sp_redirect_publisher” stored procedure

In this scenario Log Reader Agent may fail and log following error message:

Error messages:

Unable to query the redirected server ‘SQLAGLS’ for original publisher ‘SQLPUB’ and publisher database ‘PUBDB’ to determine the name of the remote server; Error 7303, Error message ‘Error 7303, Level 16, State 1, Message: Cannot initialize the data source object of OLE DB provider “SQLNCLI11” for linked server “[xxxx]”.’. (Source: MSSQL_REPL, Error number: MSSQL_REPL21879) Get help: http://help/MSSQL_REPL21879

Errors were logged when validating the redirected publisher. (Source: MSSQL_REPL, Error number: MSSQL_REPL22037) Get help: http://help/MSSQL_REPL22037

 

So in this scenario Log Reader Agent failing because replication use OLE DB provider to connect to Publisher Server which does not support MultiSubnetFailover yet. So when RegisterAllProvidersIP set to 1 for Listener resource in that case it register all dependent IP in DNS. So when OLEDB provider query SQLAGLS then DNS may return IP which is not online hence Log Reader Agent may fail.

I would recommend reading following blog to understand more about RegisterAllProvidersIP & HostRecordTTL parameters and it’s impact on connection in multi-subnet scenario:

https://blogs.msdn.microsoft.com/alwaysonpro/2014/06/03/connection-timeouts-in-multi-subnet-availability-group/

Solution

So in this current scenario if we set RegisterAllProvidersIP to 0 which mean return only active IP in that case Log Reader Agent will work fine however during AG failover it will cause connectivity issue because client operating systems will not resolve the AG Listener name to the new IP address until the currently cached entry expires and modern app will not be able to take advantage of  MultiSubnetFailover. If we reduce HostRecordTTL value to some lower number then this will increase the traffic on DNS.

So what is our best option?

Considering this scenario instead of using existing AG Listener we can create a dedicated new Listener for replication and set following parameters:

Get-ClusterResource SQLAGLSREPL| Set-ClusterParameter HostRecordTTL 120

Get-ClusterResource SQLAGLSREPL |Set-ClusterParameter RegisterAllProvidersIP 0

So RegisterAllProvidersIP 0 will ensure AG Listener return only active IP and HostRecordTTL 120 will restore connectivity quickly (120 seconds or less) in case of AG failover to different subnet.

Considering you are using this listener only for Replication hence setting HostRecordTTL to lower number will not cause large amount of traffic to DNS.

 

Vikas Rana | Twitter | Linkedin | Support Escalation Engineer
Microsoft India GTSC