Rapid growth of Tempdb on AlwaysOn secondary replica due to version store

Hello all,

 

Recently I had worked on an interesting issue with rapid growth of Tempdb on AlwaysOn secondary replica. In this article, we will cover the approach and the queries used to isolate the growth issue and the workaround/fix used to control the growth of Tempdb on secondary replica.

 

The following types of objects can occupy tempdb space:

  • Internal objects
  • Version stores
  • User objects

In the case of disk-based tables, readable secondary replicas can require space in tempdb for two reasons:

  • Snapshot isolation level copies row versions into tempdb.
  • Temporary statistics for secondary databases are created and maintained in tempdb. The temporary statistics can cause a slight increase in the size of tempdb.

 

When the Secondary replica is setup with Readable secondary setting to TRUE at AG level, there is 14 byte overhead on every row inserted/updated on the primary database. Along with that SQL Server maintains versions of the rows in tempdb database and maps the isolation level to SNAPSHOT ISOLATION level on secondary replica. The row version data is not generated by the primary databases. Instead, the secondary databases generate the row versions. However, row versioning increases data storage in both the primary and secondary databases.

 

The scenario we had worked on, Tempdb data file had grown to 500GB and version store space was around 495GB. We used the below query to calculate the version store space usage.

 select getdate() AS runtime, SUM (user_object_reserved_page_count)*8 as usr_obj_kb,
SUM (internal_object_reserved_page_count)*8 as internal_obj_kb,
SUM (version_store_reserved_page_count)*8  as version_store_kb,
SUM (unallocated_extent_page_count)*8 as freespace_kb,
SUM (mixed_extent_page_count)*8 as mixedextent_kb
FROM sys.dm_db_file_space_usage

There was no reporting workload running on the secondary replica. To identify the transactions currently maintaining an active version store, we used the below query:

 select GETDATE() AS runtime,a.*,b.kpid,b.blocked,b.lastwaittype,b.waitresource,db_name(b.dbid) asdatabase_name,
b.cpu,b.physical_io,b.memusage,b.login_time,b.last_batch,b.open_tran,b.status,b.hostname,
b.program_name,b.cmd,b.loginame,request_id
from sys.dm_tran_active_snapshot_database_transactions a
inner join sys.sysprocesses b
on a.session_id = b.spid

From the above query we retrieved the below information:

commit_sequence_num: Null
session ID: 40
is_snapshot: 0

 

From sysprocesses, we identified that session ID (40) belonged to system thread which was performing REDO work on the secondary replica.

 

We isolated the issue to be because of high number of updates happening on the primary database. On secondary replica, the redo thread was trying to catch up with the primary and was maintaining multiple versions of the rows updated to achieve read consistency.

Application team confirmed that there was no reporting workload configured on secondary replica database. So we disabled readable secondary option as a workaround for the secondary replica. Post implementing the workaround SQL stopped maintaining versions of the rows on secondary replica and the space growth was controlled.

 

Hope the information shared above helps in case you encounter Tempdb growth issues in AlwaysOn setup.

 

Please share your feedback, questions and/or suggestions.

Thanks,
Don Castelino | Premier Field Engineer | Microsoft

Disclaimer: All posts are provided AS IS with no warranties and confer no rights. Additionally, views expressed here are my own and not those of my employer, Microsoft.