Rapid growth of Tempdb on AlwaysOn secondary replica due to version store


Hello all,

 

Recently I had worked on an interesting issue with rapid growth of Tempdb on AlwaysOn secondary replica. In this article, we will cover the approach and the queries used to isolate the growth issue and the workaround/fix used to control the growth of Tempdb on secondary replica.

 

The following types of objects can occupy tempdb space:

  • Internal objects
  • Version stores
  • User objects

In the case of disk-based tables, readable secondary replicas can require space in tempdb for two reasons:

  • Snapshot isolation level copies row versions into tempdb.
  • Temporary statistics for secondary databases are created and maintained in tempdb. The temporary statistics can cause a slight increase in the size of tempdb.

 

When the Secondary replica is setup with Readable secondary setting to TRUE at AG level, there is 14 byte overhead on every row inserted/updated on the primary database. Along with that SQL Server maintains versions of the rows in tempdb database and maps the isolation level to SNAPSHOT ISOLATION level on secondary replica. The row version data is not generated by the primary databases. Instead, the secondary databases generate the row versions. However, row versioning increases data storage in both the primary and secondary databases.

 

The scenario we had worked on, Tempdb data file had grown to 500GB and version store space was around 495GB. We used the below query to calculate the version store space usage.

select getdate() AS runtime, SUM (user_object_reserved_page_count)*8 as usr_obj_kb,
SUM (internal_object_reserved_page_count)*8 as internal_obj_kb,
SUM (version_store_reserved_page_count)*8  as version_store_kb,
SUM (unallocated_extent_page_count)*8 as freespace_kb,
SUM (mixed_extent_page_count)*8 as mixedextent_kb
FROM sys.dm_db_file_space_usage

There was no reporting workload running on the secondary replica. To identify the transactions currently maintaining an active version store, we used the below query:

select GETDATE() AS runtime,a.*,b.kpid,b.blocked,b.lastwaittype,b.waitresource,db_name(b.dbid) asdatabase_name,
b.cpu,b.physical_io,b.memusage,b.login_time,b.last_batch,b.open_tran,b.status,b.hostname,
b.program_name,b.cmd,b.loginame,request_id
from sys.dm_tran_active_snapshot_database_transactions a
inner join sys.sysprocesses b
on a.session_id = b.spid

From the above query we retrieved the below information:

commit_sequence_num: Null
session ID: 40
is_snapshot: 0

 

From sysprocesses, we identified that session ID (40) belonged to system thread which was performing REDO work on the secondary replica.

 

We isolated the issue to be because of high number of updates happening on the primary database. On secondary replica, the redo thread was trying to catch up with the primary and was maintaining multiple versions of the rows updated to achieve read consistency.

Application team confirmed that there was no reporting workload configured on secondary replica database. So we disabled readable secondary option as a workaround for the secondary replica. Post implementing the workaround SQL stopped maintaining versions of the rows on secondary replica and the space growth was controlled.

 

Hope the information shared above helps in case you encounter Tempdb growth issues in AlwaysOn setup.

 

Please share your feedback, questions and/or suggestions.

Thanks,
Don Castelino | Premier Field Engineer | Microsoft

Disclaimer: All posts are provided AS IS with no warranties and confer no rights. Additionally, views expressed here are my own and not those of my employer, Microsoft.

 

Comments (5)

  1. Dave Rendón says:

    very informative, thanks for sharing!

  2. Tony YMX says:

    I also encounter the same issue on SQL Server 2014 Always on Secondary, but we have reporting workload on it, so can’t disable the readable option on the secondary , any good solution in my case ? thanks.

  3. Tony YMX says:

    In addition, session id =43 , database name is one of availability database, we have more than 10 availability databases, why is the only one DB causes version store on tempdb , any idea, thanks.

    1. Hello Tony,

      If you are not able to change the readable secondary option to FALSE , then options available are:
      1. Ensuring that there are no long running update transactions happening on primary.
      2. Adequately sizing the tempdb on the secondary replica.

      Kindly refer the article which has additional details on capacity planning considerations when readable secondary is enabled: https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/active-secondaries-readable-secondary-replicas-always-on-availability-groups?view=sql-server-2017

      1. Tony YMX says:

        Hi Don
        Thanks, I added some disk space for the tempdb as temporary solution but as you know the version store size continued to go up , since our case was in urgent , I supposed the suspected session was related to AG database sync from the description waittype(REDO_THREAD_PENDING_WORK), if we can find out a way to cancel the session then the version store space should be released.
        To reduce the impact on other DBs business system, I didn’t restart the server but requested about 30Mins downtime (off-hours) on the reporting DB agreement with business and other app team, removed the DB from AG and then joined it to the AG again on secondary, finally the suspected session gone and issue has been fixed, no impact on primary and other AG databases.
        But I still have a question why this type of background session related to AG DB redo_thread could prevent version store released ?

Skip to main content