Create Availability Group Fails With Error 35250 'Failed to join the database'

You can create an availability group using the New Availability Group wizard or using a series of transact-sql commands. A common failure can occur when attempting to create an availability group using either method. The error is as follows and is caused because the target primary and secondary replicas are unable to communicate:

Msg 35250, Level 16, State 7, Line 1
The connection to the primary replica is not active.  The command cannot be processed.

If creating the availability group using the New Availability Group wizard, the 'Joining <availability database> to availability group <availability group> at <replica>' will fail. Clicking the Error link, message 35250 is reported: 

 

If you used transact-sql to create the availability group instead of the New Availability Group wizard, the ALTER DATABASE...SET HADR AVAILABILITY GROUP... command fails with message 35250:

Following is a list of common causes for this failure and steps to diagnose each.

Reason - Inbound Port 5022 Traffic is Blocked

IMPORTANT: By default, AlwaysOn configures the database mirroring endpoints to use port 5022, and the following document will use 5022 when discussing the database mirroring endpoint in use for availability groups. However, this may not be the configured port for the database mirroring endpoints in your environment.  Query sys.tcp_endpoints on each replica to confirm which port is being used:

select name, type, port from sys.tcp_endpoints where type_desc='database_mirroring'
go

By default inbound traffic is blocked in Windows firewall.

Port 5022 is used by the primary and secondary replicas for synchronization and communication purposes. Traffic must be allowed inbound on this port. Testing has shown that if port 5022 inbound traffic is blocked at the primary, at the secondary or at both, you will be unable to create the availability group and message 35250 will be reported.

Reason - Endpoint is not created or started

Ensure the mirroring endpoints are created and started on the primary and the secondary replicas.

To detect if the endpoints are created, query for database_mirroring endpoints at the primary and the secondary.

:Connect SQLNODE1
select name, state_desc, port from sys.tcp_endpoints where type_desc='DATABASE_MIRRORING'
go
:Connect SQLNODE2
select name, state_desc, port from sys.tcp_endpoints where type_desc='DATABASE_MIRRORING'
go

If you find an endpoint does not exist, create it:

:Connect SQLNODE1
create endpoint [Hadr_endpoint]
 state=started
 as tcp (listener_port = 5022, listener_ip = all)
 for database_mirroring (role = all, authentication = windows negotiate, encryption = required algorithm aes)
go

To detect if the endpoints are started, query them at the primary and the secondary.

:Connect SQLNODE1
select name, state_desc, port FROM sys.tcp_endpoints where name='hadr_endpoint'
go
:Connect SQLNODE2
select name, state_desc, port from sys.tcp_endpoints where name='hadr_endpoint'
go

If you find an endpoint is not running, start it:

 :Connect SQLNODE1
alter endpoint [Hadr_endpoint] state = started

IMPORTANT: There is a known issue in which sys.tcp_endpoints.state_desc may incorrectly report the endpoint as STARTED when it is not started. It is a good idea to execute the ALTER ENDPOINT command for the endpoint at each replica, despite the report given by sys.tcp_endpoints, queried above.

Reason - Endpoint permissions

If database mirroring endpoints are configured to use Windows authentication, ensure that the SQL Server instances hosting your availability replicas run with a SQL Server startup account are domain accounts.

The New Availability Group wizard catches this misconfiguration, and reports it. Note that the Endpoints tab is displayed and the SQL Server Service Account reports that one of the replica's startup account is configured for LocalSystem.

If you proceed by clicking Yes and complete the wizard, it will fail during the 'Joining <availability database> to availability group <availability group> at <replica>.'

If you are attempting to create an availability group using transact-sql, this configuration problem will not be detected and command ALTER DATABASE...SET HADR AVAILABILITY GROUP...  will fail with 35250. 

Reason - SQL Server is not listening on port 5022

If SQL Server is unable to listen on port 5022, the New Availability Group wizard and the transact-sql command ALTER DATABASE...SET HADR AVAILABILITY GROUP... will fail with 35250. 

 To determine if SQL Server is listening on port 5022, review the SQL Server error log. You should find the following message(s) in the SQL Server error log:

2013-12-09 08:52:25.47 spid23s     Server is listening on [ 'any' <ipv6> 5022].
2013-12-09 08:52:25.47 spid23s     Server is listening on [ 'any' <ipv4> 5022]. 

SQL Server may not be able to listen on port 5022 if another application is already listening on the port. If you find that SQL Server is not listening on port 5022 because it is already being used, run 'netstat -a' to determine what application is using the port:

Additional diagnostics - query sys.dm_hadr_availability_replica_states on the secondary

 If the above settings check out, query the sys.dm_hadr_availability_replica_states for the last_connect_error_number which may help you diagnose the join issue. Depending on which replica was having difficulty communicating, you should query both the primary and secondary for the local replica:

select r.replica_server_name, r.endpoint_url,
rs.connected_state_desc, rs.last_connect_error_description,
rs.last_connect_error_number, rs.last_connect_error_timestamp
from sys.dm_hadr_availability_replica_states rs join sys.availability_replicas r
on rs.replica_id=r.replica_id
where rs.is_local=1

For example, if the secondary was unable to communicate with the DNS server or if a replica's endpoint_url was configured incorrectly when creating the availability group, you may get the following results: