Changing the Primary Key After You Have Data

Yesterday I blogged about changing a Primary Key (PK) during the design phase, and before you have data in the database. Even then, it’s not trivial to change the data type or column(s) that make(s) up the PK. When you have data in that Primary Key and/or you have Foreign Keys (FK) that point to a PK field, this becomes a much more involved process.

First, you MUST take a complete backup of the system, and you MUST do this work on a development system. You’re going to be manipulating base data, and in the worst kind of way. Why do I say that?

There are three kinds of data problems. One is physical data corruption, and once you restore it (I don’t trust repair with data loss) then you know where the “good” data stops and starts. You know that the FKs are pointing to the right PKs, and that the data is readable.

The second kind of data problem is “incorrect” entry, meaning that you put in “Buck Woodie” when you meant “Buck Woody”. This is also often correctable, as your Declarative Referential Integrity (DRI) will still keep that record together. It’s a simple matter to locate and correct.

But the third kind of data problem is the worst. In this kind, the FKs point to the wrong PK, or not at all. Or perhaps some of the FK data is missing. The reason this is the worst is that the data cannot be trusted – did Buck Woody really pay for that purchase or was that payment from a different record? If the links are incorrect, everything will look fine, but it won’t be correct.

So in the situation where you want to change the PKs when you already have data, one of the steps is to re-point the FKs to the proper PKs. Get this wrong and you’ll have “dirty data”. The only recourse there is to restore the backup and start over.

I’m assuming that you have a REALLY GOOD reason to make this change, and that you’ve taken that backup and you’re on a test system. As Hannibal Lector would say: “Okeedokee – here we go.”

I’ll describe two possible choices. In the first, understand and document the PK and FK relationships for each table. Then create queries that tie out the INSERTS for that data in the correct order, and include all fields for both tables, including the PKs in the parent table and the FKs in the child tables.

With those INSERT statements made, drop the PK and FK constraints on all of the tables using the script I mentioned yesterday. Make your change to the data type (or field) in the tables. Then edit the INSERT statements to have the new value types, ensuring that the “sets” or records are kept together by keeping the FK’s pointing to the right PKs. Then re-apply the PK and FK constraints, watching for errors. With backups and scripts, you can make corrections along the way.

The second method works much as the first, with the exception of using the INSERT statements. You can use SSIS or a third-party transfer process to actually move the data and change the data type or values “in flight”, although this requires more thought and planning in my experience.

In any case, you should test the results thoroughly and ensure that you’re getting the data you expect. Communicate what you’re doing, and be ready to fall back to your backups at the first sign of trouble.

Oh – since SQL Server 2005, you’ve been able to “relax” constraints to insert data. That won’t help you here, since you’re changing the entire key for whatever reason. That’s only meant to allow you to get data in when the key doesn’t change.

OK – feel free to comment on this post, since there are other ways I didn’t cover. But if you’re reading my material or any of the comments, make sure you test it yourself – I might have missed something, no one is perfect, your situation may vary. Once again, you can see why I value a good design and ERD so highly – that way you never have to get here to begin with!

Skip to main content