It is common to have this type of cases where our customer has millions of rows in a table and when they need to add a column and they reached the error transaction log full.
Depending on the Azure SQL Database tier you have two important factors to be aware of for having this process completed: Transaction Log Size and IOPS for the transaction log.
It is very important to remark the following points about the transaction log behavior in Azure SQL Database Engine:
- All databases have Full Recovery method and there is not possible to change it. That means that all transactions will be recorded in the transaction log.
- Every 5-10 minutes Azure SQL Database Engine performs a Transaction Log backup in order to prevent problems with the transaction log.
- Azure SQL Database Engine will shrink the transaction log automatically.
- The data space used by the database is not considered the transaction log space used.
In this situation, our customer got the transaction log full error adding a new PK column in their table. But, before solving the issue let me explain why:
- Every time that you add a new column, a new table will be created, SQL Server will import the data from the old table to new table and after this process, the new table, SQL Server needs to rebuild the indexes, etc. in a unique transaction, for this reason, you will have blocking issues, high IO and DATAIO consumption the transaction log will have the double/three times of space (we have a full recovery method), etc..
Let me show you an example and how to fix it:
- I created a table call DatosSource that has two fields, ID (int), Data (nchar(10))
- I changed the field ID from Int to Bigint and as you could see the SQL Server Engine does. As you could see in everything in an single transaction and all the data will be included in the transaction log (Full recovery model and there is not possible to change it). Although I could use Azure SQL Auditing, I used SQL Server OnPremise to show you there is the same that Azure SQL Database using SQL Server Profiler.
- Create a new table call TMP_DatosSource
- Block all the table for new updates.
- Insert the data into TMP_DatosSource from DatosSource
- Drop the DatoSource table
- And rename TMP_DatosSource to DatosSource again.
- Rebuild indexes if needed of the new DatosSource table
- Even, our customer scaling up the database to Premium P15 they got the issue. Our customer didn't scale up to HyperScale database tier. So, my suggestion is to reduce the transaction log migrating the data in batches, reducing the size of the transaction and creating multiple transactions.
- I create a new temporal table call TMP_DatosSource with bigint and data (nchar(10))
- I inserted this number of rows:
DECLARE @Times as int = 0
while @times <=100000000
insert into [dbo].[DatosSource] (Id,Data) values(@times,1)
- As an example, this is the best way to transfer data without causing a high consumption in your Transaction log and temporal database. My suggestion is to use a DECLARE CURSOR, you could modify the SELECT command in order to filter some specific data in order to reduce the number of rows to transfer. Every 50000 rows I executed a COMMIT TRANSACTION and initiate a new TRANSACTION. Running this query you could see the space usage. Once the process has been finished you could rename the table and create the indexes and so on to this new table.
DECLARE @id int, @name nchar(10)
DECLARE @Total as bigint = 0
DECLARE @FirstTime as bit = 0
TRUNCATE TABLE [TMP_DatosSource]
DECLARE vcursor CURSOR FOR SELECT ID, Data FROM DatosSource
FETCH NEXT FROM vcursor
INTO @id, @name
WHILE @@FETCH_STATUS = 0
INSERT INTO [dbo].[TMP_DatosSource] VALUES(@ID, @NAME)
FETCH NEXT FROM vcursor INTO @id, @name
IF @TOTAL >50000