I don’t know the exact details of what he was trying to implement, but basically, he had a main table (tBase in his simplified repro) in which different users were inserting rows into. At a given point in time, through a maintenance session and as part of a single atomic transaction, this main table would be swapped with another staging table (tStage in his repro) with identical schemes, through a simple renaming process (using sp_rename). Later through that same maintenance session, it would do whatever processing of the data that had been originally inserted in the main table, but that was visible now through the staging table, and will end up trying to truncate the staging table.
Well, it happened that from the flow described above, the TRUNCATE TABLE blocked until some outstanding transactions, which were running INSERTs against the main table, completed. And that blocking situation was something that surprised us all. Let’s examine why it was occurring just like that.
The detailed description of the repro follows here, supported with a few annotated screen captures:
Two tables are created from one session in autocommit mode, meaning that each DDL statement initiates its own transaction which is committed upon successful completion of the statement. The third statement is intended to gather the IDs assigned to both objects.
Then, from another session, an implicit transaction is initiated and a row is inserted into what was called the tBase table. The transaction is left active (neither commit nor rollback is issued), so that we can control the timing when each transaction is resolved and facilitate the repro.
As we can observe in the following screenshot, session 65 acquired an IX (Intent eXclusive) lock on the object whose id is 21575115, which corresponds to the table whose name is tBase.
From a new session (session 61) someone tries to swap the names of tables tBase and tStage. To do so, an intermediate object name tTemp is used as shown in the following image. Notice in the lower left corner of the query window, that the execution of the selected batch of statements doesn’t complete, most probably because it is being blocked by another session.
Indeed, session 61 is waiting to acquire the Sch-M (Schema Maintenance) lock on the tBase object which it requires to change its name into tTemp. And it will have to wait until session 65 releases the IX lock it acquired in order to be able to run the insert.
From another new session (session 56) a transaction begins, and an insert into the tBase table is attempted. But it blocks indefinitely.
The reason why that last session from where another INSERT is running is blocked, is because it is waiting in a queue, just behind session 61 (the one waiting for the Sch-M lock on the same resource), to acquire an IX lock on the object with ID 21575115 that, at the moment, corresponds to object named tBase.
Now, when we COMMIT the transaction we left active in session 65, from where the first INSERT on tBase had happen, the IX lock on object with ID 21575115 is released…
…and session 61 can continue executing once it could acquire the Sch-M lock it was waiting for.
But session 56 is still blocked as is expected to be, until session 61 releases the Sch-M lock on object with ID 21575115.
When session 61 COMMITs the transaction, the Sch-M lock it held on object with ID 21575115 is released.
And so the blocked INSERT that was run from session 56 can complete.
However, the object that was referred as tBase when the DML begun executing (the one with ID 21575115) is not the same object anymore after it was swapped with tStage (the one with ID 37575172). That is basically considered a change in the schema of the object and triggers a recompilation of the INSERT statement that was blocked. During the execution of that new plan, an IX lock is acquired now on the object with ID 37575172 (i.e. tBase once renamed).
Now, this is the key point you must take from this post: you would expect that the IX lock over object with ID 21575115 would be released by now, right? And that only the IX lock over object with ID 37575172 would be held, right?
Well, it seems that’s not the case. Both locks remain held despite it doesn’t make sense to continue holding the one on object with ID 21575115.
Therefore, when from a different session someone tries to truncate tStage table, it has to wait.
Until it can acquire a Sch-M lock on the object which is incompatible with the IX lock held by session 56.
Through source code analysis, two things can be concluded: 1) the code from the Relational Engine that governs table locks held at execution time (code responsible for table lock hints, schema locks, and their duration – short vs. long) have to be revamped to avoid this particular undesired behavior and a few other behaviors we already knew about, and 2) there is a workaround that makes this scenario work the way one would expect (i.e. the recompiled INSERT releasing the IX lock on the original object before acquiring the IX lock on the new object). Despite how counterintuitive this workaround may appear to be, it works, which is better than having no better offering.
The workaround consists in specifying the READCOMMITTED locking hint on the tBase table referenced by the INSERT, as in this example:
INSERT INTO tBase WITH (READCOMMITTED) VALUES (‘A different test value’)
By providing that locking hint, it forces the aforementioned function from Relational Engine where the duration of the schema lock is decided to be short (one that doesn’t have to wait for the transaction to complete before it can be released) rather than long (aka EOT, which stands for End Of Transaction).
You might ask: Do I have to specify such locking hint even if the session isolation level is Read Committed if I want it to work that way? And the answer is: Yes. And this is the part that I mentioned could not seem, and in fact is not, intuitive at all.
Hope this information is useful or that, at the very least, it has resulted entertaining or to some extent interesting.