ALTER TABLE SWITCH fails with 4947 even when all the publicly exposed metadata for the indexes in source and target tables perfectly match

A colleague who is a consultant in Chile, reported a weird scenario he was observing in one customer and for which he couldn’t articulate an explanation. I saw his question posted in one of our internal distribution lists but couldn’t put any work on it at the time because I didn’t have the time to.

Today, I was able to look into it and find out an explanation to what he described. This is exactly how he described the problem:

I’m trying to execute a partition switch between 2 tables in my database but I keep getting an error about not having an identical index in the source table when I execute SWITCH. 

ALTER TABLE [SchemaName].[TableOneName] SWITCH TO [SchemaName].[TableTwoName]

Msg 4947, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. There is no identical index in source table 'DBName.SchemaName.TableOneName' for the index 'IndexName' in target table 'DBName.SchemaName.TableTwoName.

I have scripted these two tables but I don’t see anything different in the Primary Key, clustered index definition and compression settings, do you know anything else that I need to check to find the root cause of this error?

After that, he provided us with a backup of the database with which he was able to reproduce the error.

So, I attached WinDBG to my running instance of SQL Server 2012, configured the debugger to break when the exception was thrown, and reproduced the problem. After some stack unwinding and walking through previous code paths, I noticed that SQL Server had determined the two clustered indexes were different because the metadata services (one internal component of the product in charge of managing just that: catalog’s metadata) reported that the total number of columns in the two indexes didn’t match. It reported that the source table contained one more column than the target.

However, querying sys.syscolumns, sys.index_columns or any of the subjacent system tables (only visible through a DAC session, like sys.syscolpars) didn’t show any mismatch for the columns associated with the clustered indexes of these two tables.

Digging further down in the code, I discovered that there were a few hard-coded system columns which describe certain attributes for some objects. One of these columns is used to handle Peer-To-Peer Replication Conflicts and it happened to be present in one of the two tables but not in the other. Apparently, sometime ago, the customer had configured what acted as the source table in the DDL statement, to be published using transactional replication with P2P conflict detection support. The database was backed up, and restored in a different instance. So, it was not even evident from the UI in SSMS, for example, that the database had any objects which had participated in a replication topology originally.

After having discovered this to be the cause, I was able to correlate a documentation bug which has recently been fixed and will show up in the next documentation refresh release (has been addressed in 2008, 2008 R2, and 2012). The change consist of including a note of caution in the ALTER TABLE topic saying that you should not enable partition switching for Peer-to-Peer publications, due to the hidden column which is used to detect and resolve conflict.

This is definitely an improvement on top of what we had until today explaining this behavior: nothing. But still, in my opinion, the catalog metadata interfaces in the engine should expose enough details to the execution layer of ALTER TABLE, so that it can raise a less obscure and more specific error when this condition is detected. And with that, I’ve decided to request such reasonable change in a future major release.